Pretrained Span and span Pair Encoder, code for "Pre-training Entity Relation Encoder with Intra-span and Inter-spanInformation.", EMNLP2020. It is based on our NERE toolkit (https://github.com/Receiling/NERE).

Stars: ✭ 17 (-83.96%)

Mutual labels: information-extraction

knowledge-graph-nlp-in-action

从模型训练到部署，实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。

Stars: ✭ 58 (-45.28%)

Mutual labels: information-extraction

presidio-research

This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.

Stars: ✭ 62 (-41.51%)

Mutual labels: spacy

talks

💥 Browser-based slides or PDFs of our talks and presentations

Stars: ✭ 91 (-14.15%)

Mutual labels: spacy

Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

Stars: ✭ 39 (-63.21%)

Mutual labels: information-extraction

amrlib

A python library that makes AMR parsing, generation and visualization simple.

Stars: ✭ 107 (+0.94%)

Mutual labels: spacy

spaCyTextBlob

A TextBlob sentiment analysis pipeline component for spaCy.

Stars: ✭ 30 (-71.7%)

Mutual labels: spacy

augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

Stars: ✭ 101 (-4.72%)

Mutual labels: spacy

IE Paper Notes

Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).

Stars: ✭ 14 (-86.79%)

Mutual labels: information-extraction

weak-supervision-for-NER

Framework to learn Named Entity Recognition models without labelled data using weak supervision.

Stars: ✭ 114 (+7.55%)

Mutual labels: spacy

EpiTator

EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

Stars: ✭ 38 (-64.15%)

Mutual labels: spacy

007-TheBond

This Script will help you to gather information about your victim or friend.

Stars: ✭ 371 (+250%)

Mutual labels: information-extraction

hmrb

Python Rule Processing Engine 🏺

Stars: ✭ 65 (-38.68%)

Mutual labels: spacy

science-result-extractor

No description or website provided.

Stars: ✭ 59 (-44.34%)

Mutual labels: information-extraction

tweets-preprocessor

Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team

Stars: ✭ 26 (-75.47%)

Mutual labels: spacy

View All Similar Projects ➔

ClauCy

Implementation of the ClausIE information extraction system for python+spacy.

Disclaimer: This is not meant to be a 1-1 implementation of the algorithm (which is impossible since SpaCy is used instead of Stanford Dependencies like in the paper) but a clause extraction and text simplification library I have for personal use.

I have made some modifications.

I did some exploration on how to better separate embedded clauses when using SpaCy dependencies.
I provide the ability to inflect the verbs, so that they are in a somewhat useful text form when generating propositions in text.

This allows the processing of complex sentences such as this:

A cat, hearing that the birds in a certain aviary were ailing dressed himself up as a physician, 
and, taking his cane and a bag of instruments becoming his profession, went to call on them.

to produce propositions such as these:

['The birds were ailing.']
['A cat dressed himself as a physician.', 'A cat dressed himself.']
['A cat took his cane.', 'A cat took a bag.']
['A cat became his profession.']
['A cat went.']
['A cat called on them.']

Changelog from v 0.1.0

Rewrote it to match more closely the algorithm in the paper.
Reimplemented it as a spacy pipeline component (clauses under doc._.clauses)
Added tests from the paper

Credits

While this is a re-implementation by me, original research work (and also the dictionaries) is attributed to Luciano Del Corro and Rainer Gemulla. If you use it in your code please note that there are slight modifications in the code in order to make it work with the spacy dependency parser, and also cite:

Del Corro Luciano, and Rainer Gemulla: "Clausie: clause-based open information extraction." 
Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.

It would be helpful to also cite this specific implementation if you are using it:

@InProceedings{chourdakis2018grammar,
author = {Chourdakis, E.T and Reiss, J.D.},
title = {Grammar Informed Sound Effect Retrieval for Soundscape Generation},
booktitle = {DMRN+ 13: Digital Music Research Network One-day Workshop},
month = {November},
year = {2018},
address = {London, UK},
pages={9}
}

Requirements

spacy>=2.3.0,<3.0.0 (it does not work with spacy version 3 and above)
lemminflect>=0.2.1 (only if using the inflect argument in to_propositions(as_text=True))
Python 3

Installation

$ git clone https://github.com/mmxgn/spacy-clausie.git
$ cd spacy-clausie
$ python setup.py build 
$ python setup.py install [--user]

# Optionally
$ python setup.py test

Usage

Python

$ ipython
In [1]: import spacy                                                                                                                                               
In [2]: import claucy                                                                                                                                               
In [3]: nlp = spacy.load("en")
In [4]: claucy.add_to_pipe(nlp)                                                                                                                                     
In [5]: doc = nlp("AE died in Princeton in 1955.")                                                                                                                 
In [6]: doc._.clauses                                                                                                                                               
Out[6]: [<SV, AE, died, None, None, None, [in Princeton, in 1955]>]
In [7]: propositions = doc._.clauses[0].to_propositions(as_text=True)                                                                                               
In [8]: propositions                                                                                                                                               
Out[8]: 
['AE died in Princeton in 1955',
 'AE died in 1955',
 'AE died in Princeton']

Setting as_text=False will instead give a tuple of spacy spans:

In [9]: propositions = doc._.clauses[0].to_propositions(as_text=False)                                                                                             
In [10]: propositions                                                                                                                                               
Out[10]: 
[(AE, died, in Princeton, in 1955),
 (AE, died, in 1955),
 (AE, died, in Princeton)]

Problog

Copy problog/claucy_pl.py at the same directory as your problog .pl files, include it in your scripts with:

:- use_module('claucy_pl.py').

And use it via the claucy/4 predicate. An example can be seen in problog/test_clausie.pl:

:-use_module('claucy_pl.py').

query(claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',Predicate,Arg1,Arg2)).

You can run it with:

problog test_claucy.pl

and get the output:

     claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',died,Albert Einstein,in 1955):       1         
claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',died,Albert Einstein,in Princeton):       1         
   claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',is,Albert Einstein,a scientist):       1

The variable Predicate comes directly from the verb and Arg1 and Arg2 are the first and second arguments.

License

This code is licensed under the General Public License Version 3.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

mmxgn / spacy-clausie

Programming Languages

Labels

Projects that are alternatives of or similar to spacy-clausie

ClauCy

Changelog from v 0.1.0

Credits

Requirements

Installation

Usage

Python

Problog

License