All Projects → GanjinZero → CODER

GanjinZero / CODER

Licence: other
CODER: Knowledge infused cross-lingual medical term embedding for term normalization. [JBI, ACL-BioNLP 2022]

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CODER

Sytora
A sophisticated smart symptom search engine
Stars: ✭ 111 (+362.5%)
Mutual labels:  embeddings, medical
Deepehr
Chronic Disease Prediction Using Medical Notes
Stars: ✭ 220 (+816.67%)
Mutual labels:  embeddings, medical
roco-dataset
Radiology Objects in COntext (ROCO): A Multimodal Image Dataset
Stars: ✭ 38 (+58.33%)
Mutual labels:  medical, umls
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+12.5%)
Mutual labels:  embeddings
KGE-LDA
Knowledge Graph Embedding LDA. AAAI 2017
Stars: ✭ 35 (+45.83%)
Mutual labels:  embeddings
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+112.5%)
Mutual labels:  embeddings
Dermatron
Dermatology focused medical records software, augmented with computer vision and artificial intelligence [Meteor packaged with Electron]
Stars: ✭ 19 (-20.83%)
Mutual labels:  medical
tscharts
Django REST framework-based Digital Patient Registration and EMR backend
Stars: ✭ 14 (-41.67%)
Mutual labels:  medical
EmbeddedScrollView
Embedded UIScrollView for iOS.
Stars: ✭ 55 (+129.17%)
Mutual labels:  embeddings
whatlies
Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
Stars: ✭ 351 (+1362.5%)
Mutual labels:  embeddings
HealthCare-Scan-Nearby-Hospital-Locations
I developed this android application to help beginner developers to know how to use Google Maps API and how to convert JSON data into Java Object.
Stars: ✭ 23 (-4.17%)
Mutual labels:  medical
skeleton
Composer starter project for Ambulatory.
Stars: ✭ 43 (+79.17%)
Mutual labels:  medical
MobilECG-II
Open source ECG holter
Stars: ✭ 375 (+1462.5%)
Mutual labels:  medical
Network-Embedding-Resources
Network Embedding Survey and Resources
Stars: ✭ 43 (+79.17%)
Mutual labels:  embeddings
react-native-multi-language-app
Multi Language example app with react native
Stars: ✭ 26 (+8.33%)
Mutual labels:  multi-language
Awesome-Machine-Learning-Papers
📖Notes and remarks on Machine Learning related papers
Stars: ✭ 35 (+45.83%)
Mutual labels:  embeddings
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+33.33%)
Mutual labels:  embeddings
code-compass
a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)
Stars: ✭ 33 (+37.5%)
Mutual labels:  embeddings
medical-data-android
Android app to collect data to be analyzed for medical purposes.
Stars: ✭ 24 (+0%)
Mutual labels:  medical
morphir
A universal language for business and technology
Stars: ✭ 70 (+191.67%)
Mutual labels:  multi-language

CODER

CODER CODER: Knowledge infused cross-lingual medical term embedding for term normalization. Paper

CODER++: Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations. Paper

Use the model by transformers

Models have been uploaded to huggingface/transformers repo.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("GanjinZero/UMLSBert_ENG")
model = AutoModel.from_pretrained("GanjinZero/UMLSBert_ENG")

English checkpoint: GanjinZero/coder_eng or GanjinZero/UMLSBert_ENG (old name)

English checkpoint CODER++: GanjinZero/coder_eng_pp (with hard negative sampling)

Multilingual checkpoint: GanjinZero/coder_all or GanjinZero/UMLSBert_ALL (discarded old name)

Train your model

cd pretrain
python train.py --umls_dir your_umls_dir --model_name_or_path monologg/biobert_v1.1_pubmed

your_umls_dir should contain MRCONSO.RRF, MRREL.RRF and MRSTY.RRF. UMLS Download path:UMLS.

A small tool for load UMLS RRF

from pretrain.load_umls import UMLS
umls = UMLS(your_umls_dir)

Test CODER or other embeddings

CADEC

cd test
python cadec/cadec_eval.py bert_model_name_or_path
python cadec/cadec_eval.py word_embedding_path

MANTRA GSC

Download the Mantra GSC and unzip the xml files to /test/mantra/dataset, run

cd test/mantra
python test.py

MCSM

cd test/embeddings_reimplement
python mcsm.py

DDBRC

Only sampled data is provided.

cd test/diseasedb
python train.py your_embedding embedding_type freeze_or_not gpu_id
  • embedding_type should be in [bert, word, cui]
  • freeze_or_not should be in [T, F], T means freeze the embedding, and F means fine-tune the embedding

Citation

@article{YUAN2022103983,
title = {CODER: Knowledge-infused cross-lingual medical term embedding for term normalization},
journal = {Journal of Biomedical Informatics},
pages = {103983},
year = {2022},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2021.103983},
url = {https://www.sciencedirect.com/science/article/pii/S1532046421003129},
author = {Zheng Yuan and Zhengyun Zhao and Haixia Sun and Jiao Li and Fei Wang and Sheng Yu},
keywords = {medical term normalization, cross-lingual, medical term representation, knowledge graph embedding, contrastive learning}
}
@misc{https://doi.org/10.48550/arxiv.2204.00391,
  doi = {10.48550/ARXIV.2204.00391},
  url = {https://arxiv.org/abs/2204.00391},
  author = {Zeng, Sihang and Yuan, Zheng and Yu, Sheng},
  title = {Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations},
  publisher = {arXiv},
  year = {2022}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].