Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → EuropeanaNewspapers → Ner Corpora

EuropeanaNewspapers / Ner Corpora

Licence: other

Named Entity Recognition data for Europeana Newspapers

Labels

named-entity-recognition

Projects that are alternatives of or similar to Ner Corpora

DaNLP is a repository for Natural Language Processing resources for the Danish Language.

Stars: ✭ 111 (-26.49%)

Mutual labels: named-entity-recognition

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Stars: ✭ 124 (-17.88%)

Mutual labels: named-entity-recognition

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

Stars: ✭ 141 (-6.62%)

Mutual labels: named-entity-recognition

Python wraper for MetaMap

Stars: ✭ 119 (-21.19%)

Mutual labels: named-entity-recognition

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-19.87%)

Mutual labels: named-entity-recognition

BNLP is a natural language processing toolkit for Bengali Language.

Stars: ✭ 127 (-15.89%)

Mutual labels: named-entity-recognition

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Stars: ✭ 1,392 (+821.85%)

Mutual labels: named-entity-recognition

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Stars: ✭ 148 (-1.99%)

Mutual labels: named-entity-recognition

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (-17.88%)

Mutual labels: named-entity-recognition

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+1070.2%)

Mutual labels: named-entity-recognition

命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新

Stars: ✭ 118 (-21.85%)

Mutual labels: named-entity-recognition

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Stars: ✭ 1,579 (+945.7%)

Mutual labels: named-entity-recognition

Multi-Task Deep Neural Networks for Natural Language Understanding

Stars: ✭ 1,871 (+1139.07%)

Mutual labels: named-entity-recognition

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Stars: ✭ 11,065 (+7227.81%)

Mutual labels: named-entity-recognition

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (-5.3%)

Mutual labels: named-entity-recognition

modest natural-language processing

Stars: ✭ 10,086 (+6579.47%)

Mutual labels: named-entity-recognition

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Stars: ✭ 126 (-16.56%)

Mutual labels: named-entity-recognition

👩‍🏫 Advanced NLP with spaCy: A free online course

Stars: ✭ 1,920 (+1171.52%)

Mutual labels: named-entity-recognition

Information Extraction Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Stars: ✭ 1,888 (+1150.33%)

Mutual labels: named-entity-recognition

基于Bi-GRU + CRF 的中文机构名、人名识别, 支持google bert模型

Stars: ✭ 136 (-9.93%)

Mutual labels: named-entity-recognition

View All Similar Projects ➔

ner-corpora

Named Entity Recognition corpora for Dutch, French, German from Europeana Newspapers.

Introduction

The corpora comprise of files per data provider that are encoded in the IOB format (Ramshaw & Marcus, 1995). The IOB format is a simple text chunking format that divides texts into single tokens per line, and, separated by a whitespace, tags to mark named entities. The most commonly used categories for tags are PER (person), LOC (location) and ORG (organization). To mark named entities that span multiple tokens, the tags have a prefix of either B- (beginning of named entity) or I- (inside of named entity). O (outside of named entity) tags are used to mark tokens that are not a named entity.

Example:

The O
NBA B-ORG
player O
Michael B-PER
Jordan I-PER
is O
from O
the O
United B-LOC
States I-LOC
of I-LOC
America I-LOC
. O

Background

The IOB files in this repository are based on OCRed and manually annotated historical newspapers from these libraries:

enp_DE.onb.bio - newspapers from the Austrian National Library
enp_DE.lft.bio - newspapers from the Dr Friedrich Teßmann Library
enp_DE.sbb.bio - newspapers from the Berlin State Library
enp_FR.bnf.bio - newspapers from the National Library of France
enp_NL.kb.bio - newspapers from the National Library of the Netherlands

To download the the source ALTO OCR files or the trained CRF classifier binaries, please go here.

License

Attribution

Europeana Newspapers NER corpora
https://github.com/EuropeanaNewspapers/ner-corpora/
Europeana Newspapers Project, 2012-2015
http://www.europeana-newspapers.eu/

References

An Open Corpus for Named Entity Recognition in Historic Newspapers
Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016), 23-28 May 2016, Portorož, Slovenia.

Known issues

The way the above corpora were produced, additional work is required to leverage the data for tasks such as evaluation, where gold standard quality is required as the data still contains many OCR errors. Also, due to post-processing, parts of sentences containing a high degree of noise were cut, which makes it difficult to map the annotated texts to the original newspaper articles and may entail unintended effects on classification.

Further information on data quality issues and instructions to clean up the data can be found in the wiki.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 151

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗