All Projects → ecohealthalliance → EpiTator

ecohealthalliance / EpiTator

Licence: Apache-2.0 license
EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to EpiTator

WhatsMissingInGeoparsing
The accompanying code and data for the Springer 2017 publication "What's missing in geographical parsing?" in Language Resources and Evaluation.
Stars: ✭ 15 (-60.53%)
Mutual labels:  geonames, toponym-resolution
data2019nCoV
COVID-19 Pandemic Data R Package
Stars: ✭ 40 (+5.26%)
Mutual labels:  epidemiology
ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
Stars: ✭ 40 (+5.26%)
Mutual labels:  spacy
TRUNAJOD2.0
An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (-52.63%)
Mutual labels:  spacy
hello-nlp
A natural language search microservice
Stars: ✭ 85 (+123.68%)
Mutual labels:  spacy
spacy-fastlang
Language detection using Spacy and Fasttext
Stars: ✭ 34 (-10.53%)
Mutual labels:  spacy
cli-corona
📈 Track COVID-19 (2019 novel Coronavirus) statistics via the command line.
Stars: ✭ 14 (-63.16%)
Mutual labels:  epidemiology
PM COVID
Public Available Code and Data to Reproduce Analyses in "Air pollution and COVID-19 mortality in the United States: Strengths and limitations of an ecological regression analysis."
Stars: ✭ 97 (+155.26%)
Mutual labels:  epidemiology
SimInf
A framework for data-driven stochastic disease spread simulations
Stars: ✭ 21 (-44.74%)
Mutual labels:  epidemiology
lucene-geo-gazetteer
Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.
Stars: ✭ 34 (-10.53%)
Mutual labels:  geonames
contextualSpellCheck
✔️Contextual word checker for better suggestions
Stars: ✭ 274 (+621.05%)
Mutual labels:  spacy
EpiEstimApp
Source code for the EpiEstim app.
Stars: ✭ 28 (-26.32%)
Mutual labels:  epidemiology
turing
✨ 🧬 Turing AI - Semantic Navigation, Chatbot using Search Engine and Many NLP Vendors.
Stars: ✭ 30 (-21.05%)
Mutual labels:  spacy
pyro-cov
Pyro models of SARS-CoV-2 variants
Stars: ✭ 39 (+2.63%)
Mutual labels:  epidemiology
spacy-universal-sentence-encoder
Google USE (Universal Sentence Encoder) for spaCy
Stars: ✭ 102 (+168.42%)
Mutual labels:  spacy
pynsett
A programmable relation extraction tool
Stars: ✭ 25 (-34.21%)
Mutual labels:  spacy
nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
Stars: ✭ 115 (+202.63%)
Mutual labels:  spacy
list
Repository for Global.health: a data science initiative to enable rapid sharing of trusted and open public health data to advance the response to infectious diseases.
Stars: ✭ 31 (-18.42%)
Mutual labels:  epidemiology
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (-31.58%)
Mutual labels:  epidemiology
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (+63.16%)
Mutual labels:  spacy

EpiTator

Annotators for extracting epidemiological information from text.

Installation

pip install epitator
python -m spacy download en_core_web_md

Annotators

Geoname Annotator

The geoname annotator uses the geonames.org dataset to resolve mentions of geonames. A classifier is used to disambiguate geonames and rule out false positives.

To use the geoname annotator run the following command to import geonames.org data into epitator's embedded sqlite3 database:

You should review the geonames license before using this data.

python -m epitator.importers.import_geonames

Usage

from epitator.annotator import AnnoDoc
from epitator.geoname_annotator import GeonameAnnotator
doc = AnnoDoc("Where is Chiang Mai?")
doc.add_tiers(GeonameAnnotator())
annotations = doc.tiers["geonames"].spans
geoname = annotations[0].geoname
geoname['name']
# = 'Chiang Mai'
geoname['geonameid']
# = '1153671'
geoname['latitude']
# = 18.79038
geoname['longitude']
# = 98.98468

Resolved Keyword Annotator

The resolved keyword annotator uses an sqlite database of entities to resolve mentions of multiple synonyms for an entity to a single id. This project includes scripts for importing infectious diseases and animal species into that database. The following commands can be used to invoke them:

The scripts import data from the Disease Ontology, Wikidata and ITIS. You should review their licenses and terms of use before using this data. Currently the Disease Ontology is under public domain and ITIS requests citation.

python -m epitator.importers.import_species
# By default entities under the disease by infectious agent class will be
# imported from the disease ontology, but this can be altered by supplying
# a --root-uri parameter.
python -m epitator.importers.import_disease_ontology
python -m epitator.importers.import_wikidata

Usage

from epitator.annotator import AnnoDoc
from epitator.resolved_keyword_annotator import ResolvedKeywordAnnotator
doc = AnnoDoc("5 cases of smallpox")
doc.add_tiers(ResolvedKeywordAnnotator())
annotations = doc.tiers["resolved_keywords"].spans
annotations[0].metadata["resolutions"]
# = [{'entity': <sqlite3.Row>, 'entity_id': u'http://purl.obolibrary.org/obo/DOID_8736', 'weight': 3}]

Count Annotator

The count annotator identifies counts, and case counts in particular. The count's value is extracted and parsed. Attributes such as whether the count refers to cases or deaths, or whether the value is approximate are also extracted.

Usage

from epitator.annotator import AnnoDoc
from epitator.count_annotator import CountAnnotator
doc = AnnoDoc("5 cases of smallpox")
doc.add_tiers(CountAnnotator())
annotations = doc.tiers["counts"].spans
annotations[0].metadata
# = {'count': 5, 'text': '5 cases', 'attributes': ['case']}

Date Annotator

The date annotator identifies and parses dates and date ranges. All dates are parsed into datetime ranges. For instance, a date like "11-6-87" would be parsed as a range from the start of the day to the start of the next day, while a month like "December 2011" would be parsed as a range from the start of December 1st to the start of the next month.

Usage

from epitator.annotator import AnnoDoc
from epitator.date_annotator import DateAnnotator
doc = AnnoDoc("From March 5 until April 7 1988")
doc.add_tiers(DateAnnotator())
annotations = doc.tiers["dates"].spans
annotations[0].metadata["datetime_range"]
# = [datetime.datetime(1988, 3, 5, 0, 0), datetime.datetime(1988, 4, 7, 0, 0)]

Structured Data Annotator

The structured data annotator identifies and parses embedded tables.

Usage

from epitator.annotator import AnnoDoc
from epitator.structured_data_annotator import StructuredDataAnnotator
doc = AnnoDoc("""
species | cases | deaths
Cattle  | 0     | 0
Dogs    | 2     | 1
""")
doc.add_tiers(StructuredDataAnnotator())
annotations = doc.tiers["structured_data"].spans
annotations[0].metadata
# = {'data': [
#       [AnnoSpan(1-8, species), AnnoSpan(11-16, cases), AnnoSpan(19-25, deaths)],
#       [AnnoSpan(26-32, Cattle), AnnoSpan(36-37, 0), AnnoSpan(44-45, 0)],
#       [AnnoSpan(46-50, Dogs), AnnoSpan(56-57, 2), AnnoSpan(64-65, 1)]],
#    'delimiter': '|',
#    'type': 'table'}

Structured Incident Annotator

The structured incident annotator identifies and parses embedded tables that describe case counts paired with location, date, disease and species metadata. Metadata is also extracted from the text around the table.

Usage

from epitator.annotator import AnnoDoc
from epitator.structured_incident_annotator import StructuredIncidentAnnotator
doc = AnnoDoc("""
Fictional October 2015 rabies cases in Svalbard

species | cases | deaths
Cattle  | 0     | 0
Dogs    | 4     | 1
""")
doc.add_tiers(StructuredIncidentAnnotator())
annotations = doc.tiers["structured_incidents"].spans
annotations[-1].metadata
# = {'location': {'name': u'Svalbard', ...},
#    'species': {'label': u'Canidae', ...},
#    'attributes': [],
#    'dateRange': [datetime.datetime(2015, 10, 1, 0, 0), datetime.datetime(2015, 11, 1, 0, 0)],
#    'type': 'deathCount',
#    'value': 1,
#    'resolvedDisease': {'label': u'rabies', ...}}

Architecture

EpiTator provides the following classes for organizing annotations.

AnnoDoc - The document being annotated. The AnnoDoc links to the tiers of annotations applied to it.

AnnoTier - A group of AnnoSpans. Each annotator creates one or more tiers of annotations.

AnnoSpan - A span of text with an annotation applied to it.

License

Copyright 2016 EcoHealth Alliance

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].