All Projects → sina-al → Pynlp

sina-al / Pynlp

Licence: mit
A pythonic wrapper for Stanford CoreNLP.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pynlp

Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+2344.66%)
Mutual labels:  natural-language-processing, sentiment-analysis, named-entity-recognition, part-of-speech-tagger
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+1615.53%)
Mutual labels:  natural-language-processing, named-entity-recognition, part-of-speech-tagger
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-17.48%)
Mutual labels:  natural-language-processing, sentiment-analysis, named-entity-recognition
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-41.75%)
Mutual labels:  natural-language-processing, sentiment-analysis, part-of-speech-tagger
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+20.39%)
Mutual labels:  parser, sentiment-analysis, named-entity-recognition
Pyhanlp
中文分词 词性标注 命名实体识别 依存句法分析 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁 自然语言处理
Stars: ✭ 2,564 (+2389.32%)
Mutual labels:  natural-language-processing, named-entity-recognition, part-of-speech-tagger
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+17.48%)
Mutual labels:  parser, natural-language-processing, named-entity-recognition
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+346.6%)
Mutual labels:  natural-language-processing, named-entity-recognition, part-of-speech-tagger
Iob2corpus
Japanese IOB2 tagged corpus for Named Entity Recognition.
Stars: ✭ 51 (-50.49%)
Mutual labels:  natural-language-processing, named-entity-recognition
Python Tutorial Notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Stars: ✭ 52 (-49.51%)
Mutual labels:  natural-language-processing, part-of-speech-tagger
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+990.29%)
Mutual labels:  natural-language-processing, sentiment-analysis
Corenlp
Stanford CoreNLP: A Java suite of core NLP tools.
Stars: ✭ 8,248 (+7907.77%)
Mutual labels:  natural-language-processing, named-entity-recognition
Pattern
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Stars: ✭ 8,112 (+7775.73%)
Mutual labels:  natural-language-processing, sentiment-analysis
Absa Pytorch
Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析,使用PyTorch实现。
Stars: ✭ 1,181 (+1046.6%)
Mutual labels:  natural-language-processing, sentiment-analysis
Stocksight
Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
Stars: ✭ 1,037 (+906.8%)
Mutual labels:  natural-language-processing, sentiment-analysis
Seq2annotation
基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注(Part Of Speech, POS)和命名实体识别(Named Entity Recognition, NER)等序列标注任务。
Stars: ✭ 70 (-32.04%)
Mutual labels:  named-entity-recognition, part-of-speech-tagger
Greynir
The greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-54.37%)
Mutual labels:  parser, natural-language-processing
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+999.03%)
Mutual labels:  natural-language-processing, sentiment-analysis
Senta
Baidu's open-source Sentiment Analysis System.
Stars: ✭ 1,187 (+1052.43%)
Mutual labels:  natural-language-processing, sentiment-analysis
Dialogue Understanding
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
Stars: ✭ 77 (-25.24%)
Mutual labels:  natural-language-processing, sentiment-analysis

pynlp

Build Status PyPI version

A pythonic wrapper for Stanford CoreNLP.

Description

This library provides a Python interface to Stanford CoreNLP built over corenlp_protobuf.

Installation

  1. Download Stanford CoreNLP from the official download page.
  2. Unzip the file and set your CORE_NLP environment variable to point to the directory.
  3. Install pynlp from pip
pip3 install pynlp

Quick Start

Launch the server

Lauch the StanfordCoreNLPServer using the instruction given here. Alternatively, simply run the module.

python3 -m pynlp

By default, this lauches the server on localhost using port 9000 and 4gb ram for the JVM. Use the --help option for instruction on custom configurations.

Example

Let's start off with an excerpt from a CNN article.

text = ('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, '
        'according to Kentucky State Police. State troopers responded to a call to the senator\'s '
        'residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they '
        'allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling '
        'Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he '
        'was being held in the Warren County Regional Jail on a $5,000 bond.')

Instantiate annotator

Here we demonstrate the following annotators:

  • Annotoators: tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie
  • Options: openie.resolve_coref
from pynlp import StanfordCoreNLP

annotators = 'tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie'
options = {'openie.resolve_coref': True}

nlp = StanfordCoreNLP(annotators=annotators, options=options)

Annotate text

The nlp instance is callable. Use it to annotate the text and return a Document object.

document = nlp(text)

print(document) # prints 'text'

Sentence splitting

Let's test the ssplit annotator. A Document object iterates over its Sentence objects.

for index, sentence in enumerate(document):
    print(index, sentence, sep=' )')

Output:

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

Named entity recognition

How about finding all the people mentioned in the document?

[str(entity) for entity in document.entities if entity.type == 'PERSON']

Output:

Out[2]: ['Rand Paul', 'Rene Albert Boucher', 'Paul', 'Boucher']

We may use named entities on a sentence level too.

first_sentence = document[0]
for entity in first_sentence.entities:
    print(entity, '({})'.format(entity.type))

Output:

GOP (ORGANIZATION)
Rand Paul (PERSON)
Bowling Green (LOCATION)
Kentucky (LOCATION)
Friday (DATE)
Kentucky State Police (ORGANIZATION)

Part-of-speech tagging

Let's find all the 'VB' tags in the first sentence. A Sentence object iterates over Token objects.

for token in first_sentence:
    if 'VB' in token.pos:
        print(token, token.pos)

Output:

was VBD
assaulted VBN
according VBG

Lemmatization

Using the same words, lets see the lemmas.

for token in first_sentence:
    if 'VB' in token.pos:
       print(token, '->', token.lemma)

Output:

was -> be
assaulted -> assault
according -> accord

Coreference resultion

Let's use pynlp to find the first CorefChain in the text.

chain = document.coref_chains[0]
print(chain)

Output:

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

In the string representation, coreferences are marked with parenthesis and the referent with double parenthesis. Each is also labelled with a coref_id. Let's have a closer look at the referent.

ref = chain.referent
print('Coreference: {}\n'.format(ref))

for attr in 'type', 'number', 'animacy', 'gender':
    print(attr,  getattr(ref, attr), sep=': ')

# Note that we can also index coreferences by id
assert chain[4].is_referent

Output:

Coreference: Police

type: PROPER
number: SINGULAR
animacy: ANIMATE
gender: UNKNOWN

Quotes

Extracting quotes from the text is simple.

print(document.quotes)

Output:

[<Quote: "intentionally assaulted">, <Quote: "minor injury">]

TODO (annotation wrappers):

  • [x] ssplit
  • [ ] ner
  • [x] pos
  • [x] lemma
  • [x] coref
  • [x] quote
  • [ ] quote.attribution
  • [ ] parse
  • [ ] depparse
  • [x] entitymentions
  • [ ] openie
  • [ ] sentiment
  • [ ] relation
  • [ ] kbp
  • [ ] entitylink
  • [ ] 'options' examples i.e openie.resolve_coref

Saving annotations

Write

A pynlp document can be saved as a byte string.

with open('annotation.dat', 'wb') as file:
    file.write(document.to_bytes())

Read

To load a pynlp document, instantiate a Document with the from_bytes class method.

from pynlp import Document

with open('annotation.dat', 'rb') as file:
    document = Document.from_bytes(file.read())
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].