All Projects → deweylab → Metasra Pipeline

deweylab / Metasra Pipeline

MetaSRA: normalized sample-specific metadata for the Sequence Read Archive

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Metasra Pipeline

Textract
extract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+9490.91%)
Mutual labels:  data-mining, natural-language-processing, text-mining
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+984.85%)
Mutual labels:  data-mining, natural-language-processing, text-mining
Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+248.48%)
Mutual labels:  data-mining, natural-language-processing, text-mining
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (+478.79%)
Mutual labels:  data-mining, natural-language-processing, text-mining
Graphbrain
Language, Knowledge, Cognition
Stars: ✭ 294 (+790.91%)
Mutual labels:  natural-language-processing, text-mining
Tdc
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (+781.82%)
Mutual labels:  bioinformatics, biology
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+954.55%)
Mutual labels:  data-mining, text-mining
Rmdl
RMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+1036.36%)
Mutual labels:  data-mining, text-mining
Jbrowse
A modern genome browser built with JavaScript and HTML5.
Stars: ✭ 393 (+1090.91%)
Mutual labels:  bioinformatics, biology
Cogcomp Nlp
CogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (+1142.42%)
Mutual labels:  data-mining, natural-language-processing
Ncbi Genome Download
Scripts to download genomes from the NCBI FTP servers
Stars: ✭ 494 (+1396.97%)
Mutual labels:  bioinformatics, biology
Book Socialmediaminingpython
Companion code for the book "Mastering Social Media Mining with Python"
Stars: ✭ 462 (+1300%)
Mutual labels:  data-mining, natural-language-processing
Nlp Notebooks
A collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (+1454.55%)
Mutual labels:  natural-language-processing, text-mining
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+703.03%)
Mutual labels:  natural-language-processing, text-mining
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (+848.48%)
Mutual labels:  bioinformatics, biology
Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (+690.91%)
Mutual labels:  bioinformatics, biology
Bio.jl
[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
Stars: ✭ 257 (+678.79%)
Mutual labels:  bioinformatics, biology
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+2293.94%)
Mutual labels:  natural-language-processing, text-mining
flexidot
Highly customizable, ambiguity-aware dotplots for visual sequence analyses
Stars: ✭ 73 (+121.21%)
Mutual labels:  bioinformatics, biology
lexicon-mono-seq
DOM Text Based Multiple Sequence Alignment Library
Stars: ✭ 15 (-54.55%)
Mutual labels:  bioinformatics, biology

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive

This repository contains the code implementing the pipeline used to construct the MetaSRA database described in our publication: https://doi.org/10.1093/bioinformatics/btx334.

This pipeline re-annotates key-value descriptions of biological samples using biomedical ontologies.

The MetaSRA can be searched and downloaded from: http://metasra.biostat.wisc.edu/

Dependencies

This project requires the following Python libraries:

Setup

In order to run the pipeline, a few external resources must be downloaded and configured. First, set up the PYTHONPATH environment variable to point to the directory containing the map_sra_to_ontology directory as well as to the bktree directory. Then, to set up the pipeline, run the following commands:

cd ./setup_map_sra_to_ontology
./setup.sh

This script will download the latest ontology OBO files, the SPECIALIST Lexicon files, and configure the ontologies to work with the pipeline.

Usage

The pipeline can be run on a set of sample-specific key-value pairs using the run_pipeline.py script. This script is used as follows:

python run_pipeline.py <input key-value pairs JSON file>

The script accepts as input a JSON file storing a list of sets of key-value pairs. For example, the pipeline will accept a file with the following content:

[
  {   
    "ID": "P352_141",
    "age": "48",
    "bmi": "24",
    "gender": "female",
    "source_name": "vastus lateralis muscle_female",
    "tissue": "vastus lateralis muscle"
  },
  {   
    "ID": "P352_141",
    "age": "29",
    "bmi": "30",
    "gender": "male",
    "source_name": "vastus lateralis muscle_female",
    "tissue": "vastus lateralis muscle"
  }
 ]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].