All Projects ā†’ mmxgn ā†’ spacy-clausie

mmxgn / spacy-clausie

Licence: GPL-3.0 License
Implementation of the ClausIE information extraction system for python+spacy

Programming Languages

python
139335 projects - #7 most used programming language
prolog
421 projects

Projects that are alternatives of or similar to spacy-clausie

Medacy
šŸ„ Medical Text Mining and Information Extraction with spaCy
Stars: āœ­ 287 (+170.75%)
Mutual labels:  information-extraction, spacy
Holmes Extractor
Information extraction from English and German texts based on predicate logic
Stars: āœ­ 233 (+119.81%)
Mutual labels:  information-extraction, spacy
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: āœ­ 45 (-57.55%)
Mutual labels:  information-extraction, spacy
PLE
Label Noise Reduction in Entity Typing (KDD'16)
Stars: āœ­ 53 (-50%)
Mutual labels:  information-extraction
CoVA-Web-Object-Detection
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!
Stars: āœ­ 18 (-83.02%)
Mutual labels:  information-extraction
PSPE
Pretrained Span and span Pair Encoder, code for "Pre-training Entity Relation Encoder with Intra-span and Inter-spanInformation.", EMNLP2020. It is based on our NERE toolkit (https://github.com/Receiling/NERE).
Stars: āœ­ 17 (-83.96%)
Mutual labels:  information-extraction
knowledge-graph-nlp-in-action
从ęØ”åž‹č®­ē»ƒåˆ°éƒØē½²ļ¼Œå®žęˆ˜ēŸ„čÆ†å›¾č°±(Knowledge Graph)&č‡Ŗē„¶čÆ­č؀处ē†(NLP)ć€‚ę¶‰åŠ Tensorflow, Bert+Bi-LSTM+CRF,Neo4jē­‰ 궵ē›– Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction ē­‰ä»»åŠ”怂
Stars: āœ­ 58 (-45.28%)
Mutual labels:  information-extraction
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: āœ­ 62 (-41.51%)
Mutual labels:  spacy
talks
šŸ’„ Browser-based slides or PDFs of our talks and presentations
Stars: āœ­ 91 (-14.15%)
Mutual labels:  spacy
Xponents
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Stars: āœ­ 39 (-63.21%)
Mutual labels:  information-extraction
amrlib
A python library that makes AMR parsing, generation and visualization simple.
Stars: āœ­ 107 (+0.94%)
Mutual labels:  spacy
spaCyTextBlob
A TextBlob sentiment analysis pipeline component for spaCy.
Stars: āœ­ 30 (-71.7%)
Mutual labels:  spacy
augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
Stars: āœ­ 101 (-4.72%)
Mutual labels:  spacy
IE Paper Notes
Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).
Stars: āœ­ 14 (-86.79%)
Mutual labels:  information-extraction
weak-supervision-for-NER
Framework to learn Named Entity Recognition models without labelled data using weak supervision.
Stars: āœ­ 114 (+7.55%)
Mutual labels:  spacy
EpiTator
EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.
Stars: āœ­ 38 (-64.15%)
Mutual labels:  spacy
007-TheBond
This Script will help you to gather information about your victim or friend.
Stars: āœ­ 371 (+250%)
Mutual labels:  information-extraction
hmrb
Python Rule Processing Engine šŸŗ
Stars: āœ­ 65 (-38.68%)
Mutual labels:  spacy
science-result-extractor
No description or website provided.
Stars: āœ­ 59 (-44.34%)
Mutual labels:  information-extraction
tweets-preprocessor
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team
Stars: āœ­ 26 (-75.47%)
Mutual labels:  spacy

ClauCy

Implementation of the ClausIE information extraction system for python+spacy.

Disclaimer: This is not meant to be a 1-1 implementation of the algorithm (which is impossible since SpaCy is used instead of Stanford Dependencies like in the paper) but a clause extraction and text simplification library I have for personal use.

I have made some modifications.

  • I did some exploration on how to better separate embedded clauses when using SpaCy dependencies.
  • I provide the ability to inflect the verbs, so that they are in a somewhat useful text form when generating propositions in text.

This allows the processing of complex sentences such as this:

A cat, hearing that the birds in a certain aviary were ailing dressed himself up as a physician, 
and, taking his cane and a bag of instruments becoming his profession, went to call on them.

to produce propositions such as these:

['The birds were ailing.']
['A cat dressed himself as a physician.', 'A cat dressed himself.']
['A cat took his cane.', 'A cat took a bag.']
['A cat became his profession.']
['A cat went.']
['A cat called on them.']

Changelog from v 0.1.0

  • Rewrote it to match more closely the algorithm in the paper.
  • Reimplemented it as a spacy pipeline component (clauses under doc._.clauses)
  • Added tests from the paper

Credits

While this is a re-implementation by me, original research work (and also the dictionaries) is attributed to Luciano Del Corro and Rainer Gemulla. If you use it in your code please note that there are slight modifications in the code in order to make it work with the spacy dependency parser, and also cite:

Del Corro Luciano, and Rainer Gemulla: "Clausie: clause-based open information extraction." 
Proceedings of the 22nd international conference on World Wide Web. ACM, 2013.

It would be helpful to also cite this specific implementation if you are using it:

@InProceedings{chourdakis2018grammar,
author = {Chourdakis, E.T and Reiss, J.D.},
title = {Grammar Informed Sound Effect Retrieval for Soundscape Generation},
booktitle = {DMRN+ 13: Digital Music Research Network One-day Workshop},
month = {November},
year = {2018},
address = {London, UK},
pages={9}
}

Requirements

  • spacy>=2.3.0,<3.0.0 (it does not work with spacy version 3 and above)
  • lemminflect>=0.2.1 (only if using the inflect argument in to_propositions(as_text=True))
  • Python 3

Installation

$ git clone https://github.com/mmxgn/spacy-clausie.git
$ cd spacy-clausie
$ python setup.py build 
$ python setup.py install [--user]

# Optionally
$ python setup.py test

Usage

Python

$ ipython
In [1]: import spacy                                                                                                                                               
In [2]: import claucy                                                                                                                                               
In [3]: nlp = spacy.load("en")
In [4]: claucy.add_to_pipe(nlp)                                                                                                                                     
In [5]: doc = nlp("AE died in Princeton in 1955.")                                                                                                                 
In [6]: doc._.clauses                                                                                                                                               
Out[6]: [<SV, AE, died, None, None, None, [in Princeton, in 1955]>]
In [7]: propositions = doc._.clauses[0].to_propositions(as_text=True)                                                                                               
In [8]: propositions                                                                                                                                               
Out[8]: 
['AE died in Princeton in 1955',
 'AE died in 1955',
 'AE died in Princeton']

Setting as_text=False will instead give a tuple of spacy spans:

In [9]: propositions = doc._.clauses[0].to_propositions(as_text=False)                                                                                             
In [10]: propositions                                                                                                                                               
Out[10]: 
[(AE, died, in Princeton, in 1955),
 (AE, died, in 1955),
 (AE, died, in Princeton)]

Problog

Copy problog/claucy_pl.py at the same directory as your problog .pl files, include it in your scripts with:

:- use_module('claucy_pl.py').

And use it via the claucy/4 predicate. An example can be seen in problog/test_clausie.pl:

:-use_module('claucy_pl.py').

query(claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',Predicate,Arg1,Arg2)).

You can run it with:

problog test_claucy.pl

and get the output:

     claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',died,Albert Einstein,in 1955):       1         
claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',died,Albert Einstein,in Princeton):       1         
   claucy('Albert Einstein, a scientist of the 20th century, died in Princeton in 1955.',is,Albert Einstein,a scientist):       1      

The variable Predicate comes directly from the verb and Arg1 and Arg2 are the first and second arguments.

License

This code is licensed under the General Public License Version 3.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].