All Projects → NorskRegnesentral → weak-supervision-for-NER

NorskRegnesentral / weak-supervision-for-NER

Licence: other
Framework to learn Named Entity Recognition models without labelled data using weak supervision.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
r
7636 projects

Projects that are alternatives of or similar to weak-supervision-for-NER

Spacy Lookup
Named Entity Recognition based on dictionaries
Stars: ✭ 212 (+85.96%)
Mutual labels:  spacy, named-entity-recognition
Spacy Course
👩‍🏫 Advanced NLP with spaCy: A free online course
Stars: ✭ 1,920 (+1584.21%)
Mutual labels:  spacy, named-entity-recognition
Spacy Streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+215.79%)
Mutual labels:  spacy, named-entity-recognition
spacy-server
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
Stars: ✭ 58 (-49.12%)
Mutual labels:  spacy, named-entity-recognition
CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (-23.68%)
Mutual labels:  named-entity-recognition, domain-adaptation
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-77.19%)
Mutual labels:  spacy, named-entity-recognition
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+19178.95%)
Mutual labels:  spacy, named-entity-recognition
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (-39.47%)
Mutual labels:  spacy, named-entity-recognition
anonymization-api
How to build and deploy an anonymization API with FastAPI
Stars: ✭ 51 (-55.26%)
Mutual labels:  spacy, named-entity-recognition
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (-45.61%)
Mutual labels:  spacy, named-entity-recognition
SkillsExtractorCognitiveSearch
Azure Search Cognitive Skill to extract technical and business skills from text
Stars: ✭ 51 (-55.26%)
Mutual labels:  named-entity-recognition
LoveDA
[NeurIPS2021 Poster] LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation
Stars: ✭ 111 (-2.63%)
Mutual labels:  domain-adaptation
stack-lstm-ner
Transition-based NER system
Stars: ✭ 35 (-69.3%)
Mutual labels:  named-entity-recognition
open-semantic-desktop-search
Virtual Machine for Desktop Search with Open Semantic Search
Stars: ✭ 22 (-80.7%)
Mutual labels:  named-entity-recognition
hmrb
Python Rule Processing Engine 🏺
Stars: ✭ 65 (-42.98%)
Mutual labels:  spacy
augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
Stars: ✭ 101 (-11.4%)
Mutual labels:  spacy
CLNER
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
Stars: ✭ 50 (-56.14%)
Mutual labels:  named-entity-recognition
LNEx
📍 🏢 🏦 🏣 🏪 🏬 LNEx: Location Name Extractor
Stars: ✭ 21 (-81.58%)
Mutual labels:  named-entity-recognition
clinical concept extraction
Clinical Concept Extraction with Contextual Word Embedding
Stars: ✭ 34 (-70.18%)
Mutual labels:  named-entity-recognition
fusion gan
Codes for the paper 'Learning to Fuse Music Genres with Generative Adversarial Dual Learning' ICDM 17
Stars: ✭ 18 (-84.21%)
Mutual labels:  domain-adaptation

Weak supervision for NER

BIG FAT WARNING: This codebase is now deprecated and has been replaced by our brand-new skweak framework, please check it out!

Source code associated with the paper "Named Entity Recognition without Labelled Data: a Weak Supervision Approach" accepted to ACL 2020.

Requirements:

You should first make sure that the following Python packages are installed:

  • spacy (version >= 2.2)
  • hmmlearn
  • snips-nlu-parsers
  • pandas
  • numba
  • scikit-learn

You should also install the en_core_web_sm and en_core_web_md models in Spacy.

To run the neural models in ner.py, you need also need pytorch, cupy, keras and tensorflow installed.

To run the baselines, you will also need to have snorkel installed.

Finally, you also need to download the following files and add them to the data directory:

Quick start

You should first convert your corpus to Spacy DocBin format.

Then, to run all labelling functions on your corpus, you can simply:

import annotations
annotator = annotations.FullAnnotator().add_all()
annotator.annotate_docbin('path_to_your_docbin_corpus')

You can then estimate an HMM model that aggregates all sources:

import labelling
hmm = labelling.HMMAnnotator()
hmm.train('path_to_your_docbin_corpus')

And run it on your corpus to get the aggregated labels:

hmm.annotate_docbin('path_to_your_docbin_corpus')

Step-by-step instructions

More detailed instructions with a step-by-step example are available in the Jupyter Notebook Weak Supervision.ipynb. Don't forget to run it using Jupyter to get the visualisation for the NER annotations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].