All Projects → slowkow → snpsea

slowkow / snpsea

Licence: other
📊 Identify cell types and pathways affected by genetic risk loci.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
r
7636 projects
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to snpsea

faster lmm d
A faster lmm for GWAS. Supports GPU backend.
Stars: ✭ 12 (-53.85%)
Mutual labels:  bioinformatics, gwas
CENTIPEDE.tutorial
🐛 How to use CENTIPEDE to determine if a transcription factor is bound.
Stars: ✭ 23 (-11.54%)
Mutual labels:  bioinformatics, enrichment
biolink-api
API for linked biological knowledge
Stars: ✭ 54 (+107.69%)
Mutual labels:  bioinformatics, gene
SumStatsRehab
GWAS summary statistics files QC tool
Stars: ✭ 19 (-26.92%)
Mutual labels:  bioinformatics, gwas
gene-oracle
Feature extraction algorithm for genomic data
Stars: ✭ 13 (-50%)
Mutual labels:  bioinformatics, gene
echolocatoR
Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
Stars: ✭ 13 (-50%)
Mutual labels:  bioinformatics, gwas
MSFragger
Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
Stars: ✭ 43 (+65.38%)
Mutual labels:  bioinformatics
picardmetrics
🚦 Run Picard on BAM files and collate 90 metrics into one file.
Stars: ✭ 38 (+46.15%)
Mutual labels:  bioinformatics
calour
exploratory and interactive microbiome analyses based on heatmaps
Stars: ✭ 22 (-15.38%)
Mutual labels:  bioinformatics
MAGMA Celltyping
Find causal cell-types underlying complex trait genetics
Stars: ✭ 41 (+57.69%)
Mutual labels:  gwas
peppy
Project metadata manager for PEPs in Python
Stars: ✭ 29 (+11.54%)
Mutual labels:  bioinformatics
clustergrammer2-notebooks
Examples using Clustergrammer2 to explore high-dimensional datasets.
Stars: ✭ 35 (+34.62%)
Mutual labels:  bioinformatics
SigProfilerExtractor
SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGen…
Stars: ✭ 86 (+230.77%)
Mutual labels:  bioinformatics
polyRAD
Genotype Calling with Uncertainty from Sequencing Data in Polyploids 🍌🍓🥔🍠🥝
Stars: ✭ 16 (-38.46%)
Mutual labels:  bioinformatics
argparse2tool
transparently build CWL and Galaxy XML tool definitions for any script that uses argparse
Stars: ✭ 24 (-7.69%)
Mutual labels:  bioinformatics
bistro
A library to build and execute typed scientific workflows
Stars: ✭ 43 (+65.38%)
Mutual labels:  bioinformatics
CellO
CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
Stars: ✭ 34 (+30.77%)
Mutual labels:  bioinformatics
motifmatchr
Fast motif matching in R
Stars: ✭ 25 (-3.85%)
Mutual labels:  bioinformatics
Introduction to the Unix Shell for biologists
Introduction to the Unix Shell for biologists
Stars: ✭ 16 (-38.46%)
Mutual labels:  bioinformatics
Rcpi
Molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery.
Stars: ✭ 22 (-15.38%)
Mutual labels:  bioinformatics

SNPsea: an algorithm to identify cell types, tissues, and pathways affected by risk loci

Home Page: http://www.broadinstitute.org/mpg/snpsea

Documentation: HTML | PDF | Epub

Executable: snpsea-v1.0.3.tar.gz

Data: SNPsea_data_20140520.zip

License: GNU GPLv3

Citation

If you benefit from this method, please cite:

Slowikowski, K. et al. SNPsea: an algorithm to identify cell types, tissues, and pathways affected by risk loci. Bioinformatics (2014). doi:10.1093/bioinformatics/btu326

See the first description of the algorithm and additional examples here:

Hu, X. et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. The American Journal of Human Genetics 89, 496–506 (2011). PubMed

Description

SNPsea is an algorithm to identify cell types and pathways likely to be affected by risk loci. It requires a list of SNP identifiers and a matrix of genes and conditions.

Genome-wide association studies (GWAS) have discovered multiple genomic loci associated with risk for different types of disease. SNPsea provides a simple way to determine the types of cells influenced by genes in these risk loci.

Suppose disease-associated alleles influence a small number of pathogenic cell types. We hypothesize that genes with critical functions in those cell types are likely to be within risk loci for that disease. We assume that a gene's specificity to a cell type is a reasonable indicator of its importance to the unique function of that cell type.

First, we identify the genes in linkage disequilibrium (LD) with the given trait-associated SNPs and score the gene set for specificity to each cell type. Next, we define a null distribution of scores for each cell type by sampling random SNP sets matched on the number of linked genes. Finally, we evaluate the significance of the original gene set's specificity by comparison to the null distributions: we calculate an exact permutation p-value.

SNPsea is a general algorithm. You may provide your own:

  1. Continuous gene matrix with gene expression profiles (or other values).
  2. Binary gene annotation matrix with presence/absence 1/0 values.

We provide you with three expression matrices and one annotation matrix. See the Data section of the Manual.

The columns of the matrix may be tissues, cell types, GO annotation codes, or other conditions. Continuous matrices must be normalized before running SNPsea: columns must be directly comparable to each other.

Example

SNPsea results for RBC count-associated SNPs in the Gene Atlas.

The heatmap shows Pearson correlation coefficients between pairs of tissue expression profiles. The blue bars show p-values. Statistically significant p-values cross the Bonferroni multiple testing threshold (black line).

We identified BM-CD71+Early Erythroid as the cell type with most significant enrichment (P < 2e-7) for cell type-specific gene expression relative to 78 other tissues in the Gene Atlas (Su et al. 2004).

SNPsea tested the genes in linkage disequilibrium (LD) with 45 input SNPs associated with count of red blood cells (P <= 5e-8 in Europeans) (Harst et al. 2012). For each of the 79 cell types in the Gene Atlas, we tested a maximum of 1e7 null SNP sets where each null SNP was matched to an input SNP on the number of genes in LD.

We ran SNPsea like this:

options=(
    --snps              Red_blood_cell_count-Harst2012-45_SNPs.gwas
    --gene-matrix       GeneAtlas2004.gct.gz
    --gene-intervals    NCBIgenes2013.bed.gz
    --snp-intervals     TGP2011.bed.gz
    --null-snps         Lango2010.txt.gz
    --out               out
    --slop              10e3
    --threads           8
    --null-snpsets      0
    --min-observations  100
    --max-iterations    1e7
)
snpsea ${options[*]}

# Time elapsed: 2 minutes 36 seconds

# Create the figure shown above:
snpsea-barplot out

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].