All Projects → mpuig → Spacy Lookup

mpuig / Spacy Lookup

Licence: mit
Named Entity Recognition based on dictionaries

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spacy Lookup

Spacy Streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+69.81%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner, spacy
anonymization-api
How to build and deploy an anonymization API with FastAPI
Stars: ✭ 51 (-75.94%)
Mutual labels:  spacy, named-entity-recognition, ner
Spacy Course
👩‍🏫 Advanced NLP with spaCy: A free online course
Stars: ✭ 1,920 (+805.66%)
Mutual labels:  natural-language-processing, named-entity-recognition, spacy
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+66.98%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (-70.75%)
Mutual labels:  spacy, named-entity-recognition, ner
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-87.74%)
Mutual labels:  spacy, named-entity-recognition, ner
Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (+28.77%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Pytorch Bert Crf Ner
KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)
Stars: ✭ 236 (+11.32%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+10266.98%)
Mutual labels:  natural-language-processing, named-entity-recognition, spacy
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+320.28%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-59.91%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+733.49%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-54.72%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (-14.15%)
Mutual labels:  natural-language-processing, named-entity-recognition, ner
Crf Layer On The Top Of Bilstm
The CRF Layer was implemented by using Chainer 2.0. Please see more details here: https://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/
Stars: ✭ 148 (-30.19%)
Mutual labels:  natural-language-processing, named-entity-recognition
Spacymoji
💙 Emoji handling and meta data for spaCy with custom extension attributes
Stars: ✭ 151 (-28.77%)
Mutual labels:  natural-language-processing, spacy
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (-27.36%)
Mutual labels:  natural-language-processing, named-entity-recognition
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-4.25%)
Mutual labels:  named-entity-recognition, ner
Sequence tagging
Named Entity Recognition (LSTM + CRF) - Tensorflow
Stars: ✭ 1,889 (+791.04%)
Mutual labels:  named-entity-recognition, ner
Vntk
Vietnamese NLP Toolkit for Node
Stars: ✭ 170 (-19.81%)
Mutual labels:  natural-language-processing, named-entity-recognition

spacy-lookup: Named Entity Recognition based on dictionaries


spaCy v2.0 <https://spacy.io/usage/v2>_ extension and pipeline component for adding Named Entities metadata to Doc objects. Detects Named Entities using dictionaries. The extension sets the custom Doc, Token and Span attributes ._.is_entity, ._.entity_type, ._.has_entities and ._.entities.

Named Entities are matched using the python module flashtext, and looks up in the data provided by different dictionaries.

Installation

spacy-lookup requires spacy v2.0.16 or higher.

.. code:: bash

pip install spacy-lookup

Usage

First, you need to download a language model.

.. code:: bash

python -m spacy download en

Import the component and initialise it with the shared nlp object (i.e. an instance of Language), which is used to initialise flashtext with the shared vocab, and create the match patterns. Then add the component anywhere in your pipeline.

.. code:: python

import spacy
from spacy_lookup import Entity

nlp = spacy.load('en')
entity = Entity(keywords_list=['python', 'product manager', 'java platform'])
nlp.add_pipe(entity, last=True)

doc = nlp(u"I am a product manager for a java and python.")
assert doc._.has_entities == True
assert doc[0]._.is_entity == False
assert doc[3]._.entity_desc == 'product manager'
assert doc[3]._.is_entity == True

print([(token.text, token._.canonical) for token in doc if token._.is_entity])

spacy-lookup only cares about the token text, so you can use it on a blank Language instance (it should work for all available languages <https://spacy.io/usage/models#languages>_!), or in a pipeline with a loaded model. If you're loading a model and your pipeline includes a tagger, parser and entity recognizer, make sure to add the entity component as last=True, so the spans are merged at the end of the pipeline.

Available attributes

The extension sets attributes on the Doc, Span and Token. You can change the attribute names on initialisation of the extension. For more details on custom components and attributes, see the processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>_.

====================== ======= === Token._.is_entity bool Whether the token is an entity. Token._.entity_type unicode A human-readable description of the entity. Doc._.has_entities bool Whether the document contains entity. Doc._.entities list (entity, index, description) tuples of the document's entities. Span._.has_entities bool Whether the span contains entity. Span._.entities list (entity, index, description) tuples of the span's entities. ====================== ======= ===

Settings

On initialisation of Entity, you can define the following settings:

=============== ============ === nlp Language The shared nlp object. Used to initialise the matcher with the shared Vocab, and create Doc match patterns. attrs tuple Attributes to set on the ._ property. Defaults to ('has_entities', 'is_entity', 'entity_type', 'entity'). keywords_list list Optional lookup table with the list of terms to look for. keywords_dict dict Optional lookup table with the list of terms to look for. keywords_file string Optional filename with the list of terms to look for. =============== ============ ===

.. code:: python

entity = Entity(nlp, keywords_list=['python', 'java platform'], label='ACME')
nlp.add_pipe(entity)
doc = nlp(u"I am a product manager for a java platform and python.")
assert doc[3]._.is_entity
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].