elenanereiss / Legal-Entity-Recognition

Licence: other

A Dataset of German Legal Documents for Named Entity Recognition

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Legal-Entity-Recognition

Bert Bilstm Crf Ner

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Stars: ✭ 3,838 (+3816.33%)

Mutual labels: crf, ner, blstm

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+1703.06%)

Mutual labels: crf, ner

Multilstm

keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS

Stars: ✭ 122 (+24.49%)

Mutual labels: crf, ner

Sequence tagging

Named Entity Recognition (LSTM + CRF) - Tensorflow

Stars: ✭ 1,889 (+1827.55%)

Mutual labels: crf, ner

Min nlp practice

Chinese & English Cws Pos Ner Entity Recognition implement using CNN bi-directional lstm and crf model with char embedding.基于字向量的CNN池化双向BiLSTM与CRF模型的网络，可能一体化的完成中文和英文分词，词性标注，实体识别。主要包括原始文本数据，数据转换,训练脚本,预训练模型,可用于序列标注研究.注意：唯一需要实现的逻辑是将用户数据转化为序列模型。分词准确率约为93%，词性标注准确率约为90%，实体标注（在本样本上）约为85%。

Stars: ✭ 107 (+9.18%)

Mutual labels: crf, ner

Daguan 2019 rank9

datagrand 2019 information extraction competition rank9

Stars: ✭ 121 (+23.47%)

Mutual labels: crf, ner

Clinical Ner

面向中文电子病历的命名实体识别

Stars: ✭ 151 (+54.08%)

Mutual labels: crf, ner

Ntagger

reference pytorch code for named entity tagging

Stars: ✭ 58 (-40.82%)

Mutual labels: crf, ner

Pytorch ner bilstm cnn crf

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch

Stars: ✭ 249 (+154.08%)

Mutual labels: crf, ner

ChineseNER

中文NER的那些事儿

Stars: ✭ 241 (+145.92%)

Mutual labels: crf, ner

sequence tagging

Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German

Stars: ✭ 25 (-74.49%)

Mutual labels: german, ner

Etagger

reference tensorflow code for named entity tagging

Stars: ✭ 100 (+2.04%)

Mutual labels: crf, ner

Nlp Journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+1216.33%)

Mutual labels: crf, ner

Ner

命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新

Stars: ✭ 118 (+20.41%)

Mutual labels: crf, ner

Torchcrf

An Inplementation of CRF (Conditional Random Fields) in PyTorch 1.0

Stars: ✭ 58 (-40.82%)

Mutual labels: crf, ner

Ner Slot filling

中文自然语言的实体抽取和意图识别（Natural Language Understanding），可选Bi-LSTM + CRF 或者 IDCNN + CRF

Stars: ✭ 151 (+54.08%)

Mutual labels: crf, ner

BiLSTM-CRF-NER-PyTorch

This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.

Stars: ✭ 109 (+11.22%)

Mutual labels: crf, ner

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+915.31%)

Mutual labels: crf, ner

Ner blstm Crf

LSTM-CRF for NER with ConLL-2002 dataset

Stars: ✭ 51 (-47.96%)

Mutual labels: crf, ner

Pytorch Bert Crf Ner

KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)

Stars: ✭ 236 (+140.82%)

Mutual labels: crf, ner

View All Similar Projects ➔

Legal-Entity-Recognition

Fine-grained Named Entity Recognition in Legal Documents

This work has been partially funded by the project Lynx, which has received funding from the EU's Horizon 2020 research and innovation programme under grant agreement no. 780602, see http://www.lynx-project.eu.

Dataset of Legal Documents

Court decisions from 2017 and 2018 were selected for the dataset, published online by the Federal Ministry of Justice and Consumer Protection. The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG).

Annotation Guidelines (German)

Size

The dataset consists of 66,723 sentences with 2,157,048 tokens. The sizes of the seven court-specific datasets varies between 5,858 and 12,791 sentences, and 177,835 to 404,041 tokens. The distribution of annotations on a per-token basis corresponds to approx. 19-23 %.

Distribution of Entities

The dataset includes two different versions of annotations, one with a set of 19 fine-grained semantic classes and another one with a set of 7 coarse-grained classes. There are 53,632 annotated entities in total, the majority of which (74.34 %) are legal entities, the others are person, location and organization (25.66 %).

Output Format

The dataset is freely available under the CC-BY 4.0 license. The output format is CoNLL-2002. Each line consists of two columns separated by a space. The first column contains a token and the second a tag in IOB2 format. The sentence boundary is marked with an empty line.

Token	Tag
Am	O
7.	O
März	O
2006	O
fand	O
ein	O
Treffen	O
der	O
saarländischen	B-INN
Landesregierung	I-INN
unter	O
Vorsitz	O
des	O
Ministerpräsidenten	O
Müller	B-RR
mit	O
Vertretern	O
der	O
Evangelischen	B-ORG
Kirche	I-ORG
im	I-ORG
Rheinland	I-ORG
und	O
der	O
Evangelischen	B-ORG
Kirche	I-ORG
der	I-ORG
Pfalz	I-ORG
statt	O
.	O

CRF

Models

CRF-F with features f;
CRF-FG with features und gazetteers fg;
CRF-FGL with features, gazetteers and lookup table for word similarity fgl.

Training

install sklearn-crfsuite and run (modelName=f|fg|fgl):

python crf.py modelName trainPath testPath

Models are saved in models/.

BiLSTM

Models

BiLSTM-CRF crf;
BiLSTM-CRF with char embeddings from BiLSTM blstm-crf;
BiLSTM-CNN-CRF with char embeddings from CNN cnn-crf.

Training

install BiLSTM-CNN-CRF;
copy blstm.py to folder emnlp2017-bilstm-cnn-crf/ choose a model (modelName=crf|blstm-crf|cnn-crf) and run:

python blstm.py modelName trainPath devPath testPath

Models are saved in models/.

Requirements

References:

Leitner, E. (2019). Eigennamen- und Zitaterkennung in Rechtstexten. Bachelor’s thesis, Universität Potsdam, Potsdam, 2.

@mastersthesis{mastersthesis,
  author       = {Elena Leitner}, 
  title        = {Eigennamen- und Zitaterkennung in Rechtstexten},
  school       = {Universität Potsdam},
  year         = 2019,
  address      = {Potsdam},
  month        = 2,}

Leitner, E., Rehm, G., and Moreno-Schneider, J. (2019). Fine-grained Named Entity Recognition in Legal Documents. In Maribel Acosta, et al., editors, Semantic Systems. The Power of AI and Knowledge Graphs. Proceedings of the 15th International Conference (SEMANTiCS2019), number 11702 in Lecture Notes in Computer Science, pages 272–287, Karlsruhe, Germany, 9. Springer. 10/11 September 2019.

@inproceedings{leitner2019fine,
  author = {Elena Leitner and Georg Rehm and Julian Moreno-Schneider},
  title = {{Fine-grained Named Entity Recognition in Legal Documents}},
  booktitle = {Semantic Systems. The Power of AI and Knowledge
                  Graphs. Proceedings of the 15th International Conference
                  (SEMANTiCS 2019)},
  year = 2019,
  editor = {Maribel Acosta and Philippe Cudré-Mauroux and Maria
                  Maleshkova and Tassilo Pellegrini and Harald Sack and York
                  Sure-Vetter},
  keywords = {aip},
  publisher = {Springer},
  series = {Lecture Notes in Computer Science},
  number = {11702},
  address = {Karlsruhe, Germany},
  month = 9,
  note = {10/11 September 2019},
  pages = {272--287},
  pdf = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-33220-4_20.pdf}}

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

elenanereiss / Legal-Entity-Recognition

Programming Languages

Labels

Projects that are alternatives of or similar to Legal-Entity-Recognition

Legal-Entity-Recognition

Fine-grained Named Entity Recognition in Legal Documents

Dataset of Legal Documents

Annotation Guidelines (German)

Size

Distribution of Entities

Output Format

CRF

Models

Training

BiLSTM

Models

Training

Requirements

References: