Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.

Stars: ✭ 62 (-70.75%)

Mutual labels: spacy, named-entity-recognition, ner

NER-and-Linking-of-Ancient-and-Historic-Places

An NER tool for ancient place names based on Pleiades and Spacy.

Stars: ✭ 26 (-87.74%)

Mutual labels: spacy, named-entity-recognition, ner

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (+28.77%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Pytorch Bert Crf Ner

KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)

Stars: ✭ 236 (+11.32%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stars: ✭ 21,978 (+10266.98%)

Mutual labels: natural-language-processing, named-entity-recognition, spacy

Entity Recognition Datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Stars: ✭ 891 (+320.28%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Turkish Bert Nlp Pipeline

Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.

Stars: ✭ 85 (-59.91%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+733.49%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Bond

BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision

Stars: ✭ 96 (-54.72%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (-14.15%)

Mutual labels: natural-language-processing, named-entity-recognition, ner

Crf Layer On The Top Of Bilstm

The CRF Layer was implemented by using Chainer 2.0. Please see more details here: https://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/

Stars: ✭ 148 (-30.19%)

Mutual labels: natural-language-processing, named-entity-recognition

Spacymoji

💙 Emoji handling and meta data for spaCy with custom extension attributes

Stars: ✭ 151 (-28.77%)

Mutual labels: natural-language-processing, spacy

Deeplearning nlp

基于深度学习的自然语言处理库

Stars: ✭ 154 (-27.36%)

Mutual labels: natural-language-processing, named-entity-recognition

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (-4.25%)

Mutual labels: named-entity-recognition, ner

Sequence tagging

Named Entity Recognition (LSTM + CRF) - Tensorflow

Stars: ✭ 1,889 (+791.04%)

Mutual labels: named-entity-recognition, ner

Vntk

Vietnamese NLP Toolkit for Node

Stars: ✭ 170 (-19.81%)

Mutual labels: natural-language-processing, named-entity-recognition

View All Similar Projects ➔

spacy-lookup: Named Entity Recognition based on dictionaries

spaCy v2.0 <https://spacy.io/usage/v2>_ extension and pipeline component for adding Named Entities metadata to Doc objects. Detects Named Entities using dictionaries. The extension sets the custom Doc, Token and Span attributes ._.is_entity, ._.entity_type, ._.has_entities and ._.entities.

Named Entities are matched using the python module flashtext, and looks up in the data provided by different dictionaries.

Installation

spacy-lookup requires spacy v2.0.16 or higher.

.. code:: bash

pip install spacy-lookup

Usage

First, you need to download a language model.

.. code:: bash

python -m spacy download en

Import the component and initialise it with the shared nlp object (i.e. an instance of Language), which is used to initialise flashtext with the shared vocab, and create the match patterns. Then add the component anywhere in your pipeline.

.. code:: python

import spacy
from spacy_lookup import Entity

nlp = spacy.load('en')
entity = Entity(keywords_list=['python', 'product manager', 'java platform'])
nlp.add_pipe(entity, last=True)

doc = nlp(u"I am a product manager for a java and python.")
assert doc._.has_entities == True
assert doc[0]._.is_entity == False
assert doc[3]._.entity_desc == 'product manager'
assert doc[3]._.is_entity == True

print([(token.text, token._.canonical) for token in doc if token._.is_entity])

spacy-lookup only cares about the token text, so you can use it on a blank Language instance (it should work for all available languages <https://spacy.io/usage/models#languages>_!), or in a pipeline with a loaded model. If you're loading a model and your pipeline includes a tagger, parser and entity recognizer, make sure to add the entity component as last=True, so the spans are merged at the end of the pipeline.

Available attributes

The extension sets attributes on the Doc, Span and Token. You can change the attribute names on initialisation of the extension. For more details on custom components and attributes, see the processing pipelines documentation <https://spacy.io/usage/processing-pipelines#custom-components>_.

====================== ======= === Token._.is_entity bool Whether the token is an entity. Token._.entity_type unicode A human-readable description of the entity. Doc._.has_entities bool Whether the document contains entity. Doc._.entities list (entity, index, description) tuples of the document's entities. Span._.has_entities bool Whether the span contains entity. Span._.entities list (entity, index, description) tuples of the span's entities. ====================== ======= ===

Settings

On initialisation of Entity, you can define the following settings:

=============== ============ === nlp Language The shared nlp object. Used to initialise the matcher with the shared Vocab, and create Doc match patterns. attrs tuple Attributes to set on the ._ property. Defaults to ('has_entities', 'is_entity', 'entity_type', 'entity'). keywords_list list Optional lookup table with the list of terms to look for. keywords_dict dict Optional lookup table with the list of terms to look for. keywords_file string Optional filename with the list of terms to look for. =============== ============ ===

.. code:: python

entity = Entity(nlp, keywords_list=['python', 'java platform'], label='ACME')
nlp.add_pipe(entity)
doc = nlp(u"I am a product manager for a java platform and python.")
assert doc[3]._.is_entity

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 212

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗