All Projects → CogStack → Medcat

CogStack / Medcat

Licence: apache-2.0
Medical Concept Annotation Tool

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Medcat

Dicom Server
OSS Implementation of DICOMweb standard
Stars: ✭ 101 (-24.06%)
Mutual labels:  healthcare
2019 Ncov
Use Google Maps Timeline data to compare with COVID-19 patient history location.
Stars: ✭ 116 (-12.78%)
Mutual labels:  healthcare
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1487.97%)
Mutual labels:  ner
Lightner
Inference with state-of-the-art models (pre-trained by LD-Net / AutoNER / VanillaNER / ...)
Stars: ✭ 102 (-23.31%)
Mutual labels:  ner
Sytora
A sophisticated smart symptom search engine
Stars: ✭ 111 (-16.54%)
Mutual labels:  healthcare
Daguan 2019 rank9
datagrand 2019 information extraction competition rank9
Stars: ✭ 121 (-9.02%)
Mutual labels:  ner
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-27.82%)
Mutual labels:  ner
Bnlp
BNLP is a natural language processing toolkit for Bengali Language.
Stars: ✭ 127 (-4.51%)
Mutual labels:  ner
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (-16.54%)
Mutual labels:  ner
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-6.77%)
Mutual labels:  ner
Mne Cpp
MNE-CPP: A Framework for Electrophysiology
Stars: ✭ 104 (-21.8%)
Mutual labels:  healthcare
All In One
👔 Health care application for reminding health-todo lists and making healthy habits every day.
Stars: ✭ 109 (-18.05%)
Mutual labels:  healthcare
Ner
命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新
Stars: ✭ 118 (-11.28%)
Mutual labels:  ner
Lexiconner
Lexicon-based Named Entity Recognition
Stars: ✭ 102 (-23.31%)
Mutual labels:  ner
Ner Evaluation
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Stars: ✭ 126 (-5.26%)
Mutual labels:  ner
Etagger
reference tensorflow code for named entity tagging
Stars: ✭ 100 (-24.81%)
Mutual labels:  ner
Openemr
The most popular open source electronic health records and medical practice management solution.
Stars: ✭ 1,762 (+1224.81%)
Mutual labels:  healthcare
Fhir Works On Aws Deployment
A serverless implementation of the FHIR standard that enables users to focus more on their business needs/uniqueness rather than the FHIR specification
Stars: ✭ 131 (-1.5%)
Mutual labels:  healthcare
Ner Annotator
Named Entity Recognition (NER) Annotation tool for SpaCy. Generates Traning Data as a JSON which can be readily used.
Stars: ✭ 127 (-4.51%)
Mutual labels:  ner
Multilstm
keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS
Stars: ✭ 122 (-8.27%)
Mutual labels:  ner

Medical oncept Annotation Tool

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Preprint arXiv.

SNOMED Demo

A demo application is available at MedCAT. Please note that this was trained on MedMentions and uses SNOMED for the CDB.

Interest Group, Q&A

Please use Discussions as type of interest group, or place where to ask questions and write suggestions without opening an Issue.

Tutorial

A guide on how to use MedCAT is available in the tutorial folder. Read more about MedCAT on Towards Data Science.

Papers that use MedCAT

Related Projects

  • MedCATtrainer - an interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model (MedCAT) for biomedical domain text.
  • MedCATservice - implements the MedCAT NLP application as a service behind a REST API.
  • iCAT - A docker container for CogStack/MedCAT/HuggingFace development in isolated environments.

Install using PIP (Requires Python 3.6.1+)

  1. Install MedCAT

pip install --upgrade medcat

  1. Get the scispacy models:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz

  1. Download the Vocabulary and CDB from the Models section below

  2. Quickstart:

from medcat.cat import CAT
from medcat.utils.vocab import Vocab
from medcat.cdb import CDB 

vocab = Vocab()
# Load the vocab model you downloaded
vocab.load_dict('<path to the vocab file>')

# Load the cdb model you downloaded
cdb = CDB()
cdb.load_dict('<path to the cdb file>') 

# create cat
cat = CAT(cdb=cdb, vocab=vocab)

# Test it
text = "My simple document with kidney failure"
doc_spacy = cat(text)
# Print detected entities
print(doc_spacy.ents)

# Or to get an array of entities, this will return much more information
#and usually easier to use unless you know a lot about spaCy
doc = cat.get_entities(text)
print(doc)

Models

A basic trained model is made public for the vocabulary and CDB. It is trained for the ~ 35K concepts available in MedMentions. It is quite limited so the performance might not be the best.

Vocabulary Download - Built from Wiktionary

CDB Download - Built from MedMentions

(Note: This is was compiled from MedMentions and does not have any data from NLM as that data is not publicaly available.)

SNOMED-CT and UMLS

If you have access to UMLS or SNOMED-CT and can provide some proof (a screenshot of the UMLS profile page is perfect, feel free to redact all information you do not want to share), contact us - we are happy to share the pre-built CDB and Vocab for those databases.

Alternatively, you can build the CDBs for scratch from source data. We have used the below steps to build UMLS and SNOMED-CT (UK) for our experiments

Building Concept Databases from Scratch

We provide details to build both UMLS and SNOMED-CT concept databases. In both cases CSV files containing the source data with required columns (column descriptions are provided in the tutorial. Given the CSV files the prepare_cdb.py script can be used to build a CDB.

Building a UMLS Concept Database

The UMLS can be downloaded from https://www.nlm.nih.gov/research/umls/index.html in the Rich Release Format (RRF). To make subsetting and filtering easier we import UMLS RRF into a PostgreSQL database (scripts available at here).

Once the data is in the database we can use the following SQL script to download the CSV files containing all concepts that will form our CDB.

# Selecting concepts for all the Ontologies that are used
SELECT DISTINCT umls.mrconso.cui, str, mrconso.sab, mrconso.tty, tui, sty, def 
FROM umls.mrconso 
    LEFT OUTER JOIN umls.mrsty ON umls.mrsty.cui = umls.mrconso.cui 
    LEFT OUTER JOIN umls.mrdef ON umls.mrconso.cui = umls.mrdef.cui
WHERE lat='ENG'
Building a SNOMED-CT Concept Database

We use the SNOMED-CT data provided by the NHS TRUD service https://isd.digital.nhs.uk/trud3/user/guest/group/0/pack/26. This release combines the International and UK specific concepts into a set of assets that can be parsed and loaded into a MedCAT CDB. We provide scripts for parsing the various release files and load into a MedCAT CDB instance. We provide further scripts to load accompanying SNOMED-CT Drug extension and clinical coding data (ICD / OPCS terminologies) also from the NHS TRUD service. Scripts are available at: https://github.com/tomolopolis/SNOMED-CT_Analysis

Acknowledgement

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.

Citation

@misc{kraljevic2020multidomain,
      title={Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit}, 
      author={Zeljko Kraljevic and Thomas Searle and Anthony Shek and Lukasz Roguski and Kawsar Noor and Daniel Bean and Aurelie Mascio and Leilei Zhu and Amos A Folarin and Angus Roberts and Rebecca Bendayan and Mark P Richardson and Robert Stewart and Anoop D Shah and Wai Keong Wong and Zina Ibrahim and James T Teo and Richard JB Dobson},
      year={2020},
      eprint={2010.01165},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].