Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sina-al → Pynlp

sina-al / Pynlp

Licence: mit

A pythonic wrapper for Stanford CoreNLP.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp natural-language-processing parser wrapper sentiment-analysis named-entity-recognition stanford part-of-speech-tagger

Projects that are alternatives of or similar to Pynlp

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+2344.66%)

Mutual labels: natural-language-processing, sentiment-analysis, named-entity-recognition, part-of-speech-tagger

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+1615.53%)

Mutual labels: natural-language-processing, named-entity-recognition, part-of-speech-tagger

Turkish Bert Nlp Pipeline

Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.

Stars: ✭ 85 (-17.48%)

Mutual labels: natural-language-processing, sentiment-analysis, named-entity-recognition

Textblob Ar

Arabic support for textblob

Stars: ✭ 60 (-41.75%)

Mutual labels: natural-language-processing, sentiment-analysis, part-of-speech-tagger

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (+20.39%)

Mutual labels: parser, sentiment-analysis, named-entity-recognition

Pyhanlp

中文分词词性标注命名实体识别依存句法分析新词发现关键词短语提取自动摘要文本分类聚类拼音简繁自然语言处理

Stars: ✭ 2,564 (+2389.32%)

Mutual labels: natural-language-processing, named-entity-recognition, part-of-speech-tagger

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+17.48%)

Mutual labels: parser, natural-language-processing, named-entity-recognition

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+346.6%)

Mutual labels: natural-language-processing, named-entity-recognition, part-of-speech-tagger

Iob2corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.

Stars: ✭ 51 (-50.49%)

Mutual labels: natural-language-processing, named-entity-recognition

Python Tutorial Notebooks

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Stars: ✭ 52 (-49.51%)

Mutual labels: natural-language-processing, part-of-speech-tagger

Repo 2017

Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano

Stars: ✭ 1,123 (+990.29%)

Mutual labels: natural-language-processing, sentiment-analysis

Corenlp

Stanford CoreNLP: A Java suite of core NLP tools.

Stars: ✭ 8,248 (+7907.77%)

Mutual labels: natural-language-processing, named-entity-recognition

Pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Stars: ✭ 8,112 (+7775.73%)

Mutual labels: natural-language-processing, sentiment-analysis

Absa Pytorch

Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析，使用PyTorch实现。

Stars: ✭ 1,181 (+1046.6%)

Mutual labels: natural-language-processing, sentiment-analysis

Stocksight

Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis

Stars: ✭ 1,037 (+906.8%)

Mutual labels: natural-language-processing, sentiment-analysis

Seq2annotation

基于 TensorFlow & PaddlePaddle 的通用序列标注算法库（目前包含 BiLSTM+CRF, Stacked-BiLSTM+CRF 和 IDCNN+CRF，更多算法正在持续添加中）实现中文分词（Tokenizer / segmentation）、词性标注（Part Of Speech, POS）和命名实体识别（Named Entity Recognition, NER）等序列标注任务。

Stars: ✭ 70 (-32.04%)

Mutual labels: named-entity-recognition, part-of-speech-tagger

Greynir

The greynir.is natural language processing website for Icelandic

Stars: ✭ 47 (-54.37%)

Mutual labels: parser, natural-language-processing

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+999.03%)

Mutual labels: natural-language-processing, sentiment-analysis

Senta

Baidu's open-source Sentiment Analysis System.

Stars: ✭ 1,187 (+1052.43%)

Mutual labels: natural-language-processing, sentiment-analysis

Dialogue Understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Stars: ✭ 77 (-25.24%)

Mutual labels: natural-language-processing, sentiment-analysis

View All Similar Projects ➔

pynlp

A pythonic wrapper for Stanford CoreNLP.

Description

This library provides a Python interface to Stanford CoreNLP built over corenlp_protobuf.

Installation

Download Stanford CoreNLP from the official download page.
Unzip the file and set your CORE_NLP environment variable to point to the directory.
Install pynlp from pip

pip3 install pynlp

Quick Start

Launch the server

Lauch the StanfordCoreNLPServer using the instruction given here. Alternatively, simply run the module.

python3 -m pynlp

By default, this lauches the server on localhost using port 9000 and 4gb ram for the JVM. Use the --help option for instruction on custom configurations.

Example

Let's start off with an excerpt from a CNN article.

text = ('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, '
        'according to Kentucky State Police. State troopers responded to a call to the senator\'s '
        'residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they '
        'allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling '
        'Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he '
        'was being held in the Warren County Regional Jail on a $5,000 bond.')

Instantiate annotator

Here we demonstrate the following annotators:

Annotoators: tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie
Options: openie.resolve_coref

from pynlp import StanfordCoreNLP

annotators = 'tokenize, ssplit, pos, lemma, ner, entitymentions, coref, sentiment, quote, openie'
options = {'openie.resolve_coref': True}

nlp = StanfordCoreNLP(annotators=annotators, options=options)

Annotate text

The nlp instance is callable. Use it to annotate the text and return a Document object.

document = nlp(text)

print(document) # prints 'text'

Sentence splitting

Let's test the ssplit annotator. A Document object iterates over its Sentence objects.

for index, sentence in enumerate(document):
    print(index, sentence, sep=' )')

Output:

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

Named entity recognition

How about finding all the people mentioned in the document?

[str(entity) for entity in document.entities if entity.type == 'PERSON']

Output:

Out[2]: ['Rand Paul', 'Rene Albert Boucher', 'Paul', 'Boucher']

We may use named entities on a sentence level too.

first_sentence = document[0]
for entity in first_sentence.entities:
    print(entity, '({})'.format(entity.type))

Output:

GOP (ORGANIZATION)
Rand Paul (PERSON)
Bowling Green (LOCATION)
Kentucky (LOCATION)
Friday (DATE)
Kentucky State Police (ORGANIZATION)

Part-of-speech tagging

Let's find all the 'VB' tags in the first sentence. A Sentence object iterates over Token objects.

for token in first_sentence:
    if 'VB' in token.pos:
        print(token, token.pos)

Output:

was VBD
assaulted VBN
according VBG

Lemmatization

Using the same words, lets see the lemmas.

for token in first_sentence:
    if 'VB' in token.pos:
       print(token, '->', token.lemma)

Output:

was -> be
assaulted -> assault
according -> accord

Coreference resultion

Let's use pynlp to find the first CorefChain in the text.

chain = document.coref_chains[0]
print(chain)

Output:

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

In the string representation, coreferences are marked with parenthesis and the referent with double parenthesis. Each is also labelled with a coref_id. Let's have a closer look at the referent.

ref = chain.referent
print('Coreference: {}\n'.format(ref))

for attr in 'type', 'number', 'animacy', 'gender':
    print(attr,  getattr(ref, attr), sep=': ')

# Note that we can also index coreferences by id
assert chain[4].is_referent

Output:

Coreference: Police

type: PROPER
number: SINGULAR
animacy: ANIMATE
gender: UNKNOWN

Quotes

Extracting quotes from the text is simple.

print(document.quotes)

Output:

[<Quote: "intentionally assaulted">, <Quote: "minor injury">]

TODO (annotation wrappers):

[x] ssplit
[ ] ner
[x] pos
[x] lemma
[x] coref
[x] quote
[ ] quote.attribution
[ ] parse
[ ] depparse
[x] entitymentions
[ ] openie
[ ] sentiment
[ ] relation
[ ] kbp
[ ] entitylink
[ ] 'options' examples i.e openie.resolve_coref

Saving annotations

Write

A pynlp document can be saved as a byte string.

with open('annotation.dat', 'wb') as file:
    file.write(document.to_bytes())

Read

To load a pynlp document, instantiate a Document with the from_bytes class method.

from pynlp import Document

with open('annotation.dat', 'rb') as file:
    document = Document.from_bytes(file.read())

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 103

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗