All Projects → baiyyang → Medical Entity Recognition

baiyyang / Medical Entity Recognition

Licence: apache-2.0
包含传统的基于统计模型(CRF)和基于深度学习(Embedding-Bi-LSTM-CRF)下的医疗数据命名实体识别

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Medical Entity Recognition

Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
Stars: ✭ 1,888 (+926.09%)
Mutual labels:  named-entity-recognition
Bert Ner Tf
Named Entity Recognition with BERT using TensorFlow 2.0
Stars: ✭ 155 (-15.76%)
Mutual labels:  named-entity-recognition
Chinsesner Pytorch
基于BI-LSTM+CRF的中文命名实体识别 Pytorch
Stars: ✭ 174 (-5.43%)
Mutual labels:  named-entity-recognition
Spacy Course
👩‍🏫 Advanced NLP with spaCy: A free online course
Stars: ✭ 1,920 (+943.48%)
Mutual labels:  named-entity-recognition
Sequence tagging
Named Entity Recognition (LSTM + CRF) - Tensorflow
Stars: ✭ 1,889 (+926.63%)
Mutual labels:  named-entity-recognition
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (-10.33%)
Mutual labels:  named-entity-recognition
Triggerner
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
Stars: ✭ 141 (-23.37%)
Mutual labels:  named-entity-recognition
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (-1.09%)
Mutual labels:  named-entity-recognition
Fox
Federated Knowledge Extraction Framework
Stars: ✭ 155 (-15.76%)
Mutual labels:  named-entity-recognition
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1268.48%)
Mutual labels:  named-entity-recognition
Ner Corpora
Named Entity Recognition data for Europeana Newspapers
Stars: ✭ 151 (-17.93%)
Mutual labels:  named-entity-recognition
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (-16.3%)
Mutual labels:  named-entity-recognition
Zh Ner Tf
A very simple BiLSTM-CRF model for Chinese Named Entity Recognition 中文命名实体识别 (TensorFlow)
Stars: ✭ 2,063 (+1021.2%)
Mutual labels:  named-entity-recognition
Ld Net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Stars: ✭ 148 (-19.57%)
Mutual labels:  named-entity-recognition
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+1114.67%)
Mutual labels:  named-entity-recognition
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (-22.28%)
Mutual labels:  named-entity-recognition
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-11.96%)
Mutual labels:  named-entity-recognition
Persian Ner
پیکره بزرگ شناسایی موجودیت‌های نامدار فارسی برچسب خورده
Stars: ✭ 183 (-0.54%)
Mutual labels:  named-entity-recognition
Gerbil
GERBIL - General Entity annotatoR Benchmark
Stars: ✭ 180 (-2.17%)
Mutual labels:  named-entity-recognition
Vntk
Vietnamese NLP Toolkit for Node
Stars: ✭ 170 (-7.61%)
Mutual labels:  named-entity-recognition

medical-entity-recognition

Describe

本项目是针对医疗数据,进行命名实体识别。主要采用的方法:

  1. 基于条件随机场(Condition Random Fields, CRF)的命名实体识别.

  2. 基于双向长短时记忆神经网络和条件随机场(Bi-LSTM-CRF)的命名实体识别。

Introduce

  1. raw_data是原始数据,来源于CCKS2017任务二中,针对医疗电子病例进行命名实体识别。reader.py文件是对原始数据进行处理,生成标准的NER格式(data, pos, label)的数据。

  2. train_test_data是模型的训练和测试的语料,其中word2id.pkl和char2id.pkl是神经网络中需要读入的字典。

  3. crf文件夹是使用CRF进行命名实体识别的模型,其中medical_entity_recognition_bio_char_ori.crfsuite和medical_entity_recognition_bio_word_ori.crfsuite分别是训练好的,以字为特征单元和词为特征单元的模型。

  4. bilstm_crf文件夹中是基于神经网络的命名实体识别的模型。其中,bio_model下存放的是已经训练好的两个模型。分别是随机初始化embedding的字向量和词向量的模型。其中:

  • 训练新的模型方法:

python main.py --mode train --data_dir *** --train_data *** --test_data *** --dictionary ***

  • 测试已有模型方法:

python main.py --mode test --data_dir ../train_test_data --train_data train_bio_char.txt --test_data test_bio_char.txt --dictionary char2id.pkl --demo_model random_char_300

Requirements

python 3

pycrfsuite:pip install python-crfsuite

zhon:pip install zhon

tensorflow >= 1.4

Result

分别以字和词为单元进行训练,实验结果如下:

model char_unit word_unit
CRF 0.73 0.74
Bi-LSTM_CRF 0.80 0.78

Reference

guillaumegenthial/sequence_tagging

Other

欢迎各位大佬,批评指正

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].