All Projects → greenelab → snorkeling

greenelab / snorkeling

Licence: other
Extracting biomedical relationships from literature with Snorkel 🏊

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to snorkeling

Multi Plier
An unsupervised transfer learning approach for rare disease transcriptomics
Stars: ✭ 33 (-41.07%)
Mutual labels:  analysis, dataset, methodology
Daps
Denoising Autoencoders for Phenotype Stratification
Stars: ✭ 39 (-30.36%)
Mutual labels:  analysis, methodology
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+116.07%)
Mutual labels:  text-mining, dataset
Pancancer
Building classifiers using cancer transcriptomes across 33 different cancer-types
Stars: ✭ 84 (+50%)
Mutual labels:  analysis, methodology
shared-latent-space
Shared Latent Space VAE's
Stars: ✭ 15 (-73.21%)
Mutual labels:  analysis, methodology
Hdltex
HDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (+241.07%)
Mutual labels:  text-mining, dataset
Sooty
The SOC Analysts all-in-one CLI tool to automate and speed up workflow.
Stars: ✭ 867 (+1448.21%)
Mutual labels:  workflow, analysis
Nlppln
NLP pipeline software using common workflow language
Stars: ✭ 31 (-44.64%)
Mutual labels:  workflow, text-mining
Cytoflow
A Python toolbox for quantitative, reproducible flow cytometry analysis
Stars: ✭ 90 (+60.71%)
Mutual labels:  workflow, analysis
RWorkflow
📑 My approach to an analysis or product produced with R
Stars: ✭ 25 (-55.36%)
Mutual labels:  workflow, analysis
TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
Stars: ✭ 99 (+76.79%)
Mutual labels:  dataset
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-41.07%)
Mutual labels:  text-mining
RDPlot
Tool for plotting rd curves from output of video coding test model software
Stars: ✭ 22 (-60.71%)
Mutual labels:  analysis
CaseManagement
CMMN engine implementation in dotnet core
Stars: ✭ 16 (-71.43%)
Mutual labels:  workflow
Thirukkural-English-Translation-Dataset
Thirukural in English
Stars: ✭ 12 (-78.57%)
Mutual labels:  dataset
DNAscan
DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.
Stars: ✭ 36 (-35.71%)
Mutual labels:  workflow
keen-analysis.js
A light JavaScript client for Keen
Stars: ✭ 40 (-28.57%)
Mutual labels:  analysis
tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
Stars: ✭ 24 (-57.14%)
Mutual labels:  analysis
alfred-workflow
No description or website provided.
Stars: ✭ 26 (-53.57%)
Mutual labels:  workflow
Tesseract
A set of libraries for rapidly developing Pipeline driven micro/macroservices.
Stars: ✭ 20 (-64.29%)
Mutual labels:  workflow

Snorkeling

This repository stores data and code to scale up the extraction of biomedical relationships (i.e. Disease-Gene associations, Compounds binding to Genes, Gene-Gene interactions etc.) from the Pubmed Abstracts.

Depreciation Note

An updated version of this project can be found at: greenelab/snorkeling-full-text. New changes pertaining to the repository can be found at the link provided previously.

Quick Synopsis

This work uses a subset of Hetionet v1 (bolded in the resource schema below), which is a heterogenous network that contains pharmacological and biological information in the form of nodes and edges. This network was made from publicly available data, which is usually populated via manual curation. Manual curation is time consuming and difficult to scale as the rate of publications continues to rise. A recently introduced "Data Programming" paradigm can circumvent this issue by being able to generate large annotated datasets quickly. This paradigm combines distant supervision with simple rules and heuristics written as labeling functions to automatically annotate large datasets. Unfortunately, it takes a significant amount of time and effort to write a useful label function. Because of this fact, we aimed to speed up this process by re-using label functions across edge types. Read the full paper here.

Highlighted edges used in Hetionet v1

Directories

Described below are the main folders for this project. For convention the folder names are based on the schema shown above.

Name Descirption
compound_disease Head folder that contains all relationships compounds and diseases may share
compound_gene Head folder that contains all relationships compounds and genes may share
disease_gene Head folder that contains all realtionships disease and genes may share
gene_gene Head folder than contains all realtionships genes may share with each other
dependency cluster This folder contains preprocessed results from the "A global network of biomedical relationships derived from text" paper.
figures This folder contains figures for this work
modules This folder contains helper scripts that this work uses
playground This folder contains ancient code designed to test and understand the snorkel package.

Installing/Setting Up The Conda Environment

Snorkeling uses conda as a python package manager. Before moving on to the instructions below, please make sure to have it installed. Download conda here!!

Once everything has been installed, type following command in the terminal:

conda env create --file environment.yml

You can activate the environment by using the following command:

source activate snorkeling

Note: If you want to leave the environment, just enter the following command:

source deactivate 

License

This repository is dual licensed as BSD 3-Clause and CC0 1.0, meaning any repository content can be used under either license. This licensing arrangement ensures source code is available under an OSI-approved License, while non-code content — such as figures, data, and documentation — is maximally reusable under a public domain dedication.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].