Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Determined22 → Zh Ner Tf

Determined22 / Zh Ner Tf

A very simple BiLSTM-CRF model for Chinese Named Entity Recognition 中文命名实体识别 (TensorFlow)

Programming Languages

139335 projects - #7 most used programming language

6916 projects

Labels

tensorflow named-entity-recognition bilstm-crf-model

Projects that are alternatives of or similar to Zh Ner Tf

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Stars: ✭ 124 (-93.99%)

Mutual labels: named-entity-recognition

Information Extraction Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Stars: ✭ 1,888 (-8.48%)

Mutual labels: named-entity-recognition

Sequence tagging

Named Entity Recognition (LSTM + CRF) - Tensorflow

Stars: ✭ 1,889 (-8.43%)

Mutual labels: named-entity-recognition

BNLP is a natural language processing toolkit for Bengali Language.

Stars: ✭ 127 (-93.84%)

Mutual labels: named-entity-recognition

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

Stars: ✭ 141 (-93.17%)

Mutual labels: named-entity-recognition

👩‍🏫 Advanced NLP with spaCy: A free online course

Stars: ✭ 1,920 (-6.93%)

Mutual labels: named-entity-recognition

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-94.13%)

Mutual labels: named-entity-recognition

A text tagger based on Lucene / Solr, using FST technology

Stars: ✭ 162 (-92.15%)

Mutual labels: named-entity-recognition

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (-93.07%)

Mutual labels: named-entity-recognition

Deeplearning nlp

基于深度学习的自然语言处理库

Stars: ✭ 154 (-92.54%)

Mutual labels: named-entity-recognition

Multi-Task Deep Neural Networks for Natural Language Understanding

Stars: ✭ 1,871 (-9.31%)

Mutual labels: named-entity-recognition

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (-14.35%)

Mutual labels: named-entity-recognition

Named Entity Recognition data for Europeana Newspapers

Stars: ✭ 151 (-92.68%)

Mutual labels: named-entity-recognition

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Stars: ✭ 126 (-93.89%)

Mutual labels: named-entity-recognition

Federated Knowledge Extraction Framework

Stars: ✭ 155 (-92.49%)

Mutual labels: named-entity-recognition

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (-93.99%)

Mutual labels: named-entity-recognition

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Stars: ✭ 148 (-92.83%)

Mutual labels: named-entity-recognition

Open Semantic Etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

Stars: ✭ 165 (-92%)

Mutual labels: named-entity-recognition

Named Entity Recognition with BERT using TensorFlow 2.0

Stars: ✭ 155 (-92.49%)

Mutual labels: named-entity-recognition

Crf Layer On The Top Of Bilstm

The CRF Layer was implemented by using Chainer 2.0. Please see more details here: https://createmomo.github.io/2017/09/12/CRF_Layer_on_the_Top_of_BiLSTM_1/

Stars: ✭ 148 (-92.83%)

Mutual labels: named-entity-recognition

View All Similar Projects ➔

A simple BiLSTM-CRF model for Chinese Named Entity Recognition

This repository includes the code for buliding a very simple character-based BiLSTM-CRF sequence labeling model for Chinese Named Entity Recognition task. Its goal is to recognize three types of Named Entity: PERSON, LOCATION and ORGANIZATION.

This code works on Python 3 & TensorFlow 1.2 and the following repository https://github.com/guillaumegenthial/sequence_tagging gives me much help.

Model

This model is similar to the models provided by paper [1] and [2]. Its structure looks just like the following illustration:

For one Chinese sentence, each character in this sentence has / will have a tag which belongs to the set {O, B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG}.

The first layer, look-up layer, aims at transforming each character representation from one-hot vector into character embedding. In this code I initialize the embedding matrix randomly. We could add some linguistic knowledge later. For example, do tokenization and use pre-trained word-level embedding, then augment character embedding with the corresponding token's word embedding. In addition, we can get the character embedding by combining low-level features (please see paper[2]'s section 4.1 and paper[3]'s section 3.3 for more details).

The second layer, BiLSTM layer, can efficiently use both past and future input information and extract features automatically.

The third layer, CRF layer, labels the tag for each character in one sentence. If we use a Softmax layer for labeling, we might get ungrammatic tag sequences beacuse the Softmax layer labels each position independently. We know that 'I-LOC' cannot follow 'B-PER' but Softmax doesn't know. Compared to Softmax, a CRF layer can use sentence-level tag information and model the transition behavior of each two different tags.

Dataset

	#sentence	#PER	#LOC	#ORG
train	46364	17615	36517	20571
test	4365	1973	2877	1331

It looks like a portion of MSRA corpus. I downloaded the dataset from the link in ./data_path/original/link.txt

data files

The directory ./data_path contains:

the preprocessed data files, train_data and test_data
a vocabulary file word2id.pkl that maps each character to a unique id

For generating vocabulary file, please refer to the code in data.py.

data format

Each data file should be in the following format:

中	B-LOC
国	I-LOC
很	O
大	O

句	O
子	O
结	O
束	O
是	O
空	O
行	O

If you want to use your own dataset, please:

transform your corpus to the above format
generate a new vocabulary file

How to Run

train

python main.py --mode=train

test

python main.py --mode=test --demo_model=1521112368

Please set the parameter --demo_model to the model that you want to test. 1521112368 is the model trained by me.

An official evaluation tool for computing metrics: here (click 'Instructions')

My test performance:

P	R	F	F (PER)	F (LOC)	F (ORG)
0.8945	0.8752	0.8847	0.8688	0.9118	0.8515

demo

python main.py --mode=demo --demo_model=1521112368

You can input one Chinese sentence and the model will return the recognition result:

Reference

[1] Bidirectional LSTM-CRF Models for Sequence Tagging

[2] Neural Architectures for Named Entity Recognition

[3] Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition

[4] https://github.com/guillaumegenthial/sequence_tagging

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 2,063

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (74) 🔗