All Projects → RajLabMSSM → echolocatoR

RajLabMSSM / echolocatoR

Licence: MIT License
Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.

Programming Languages

r
7636 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to echolocatoR

faster lmm d
A faster lmm for GWAS. Supports GPU backend.
Stars: ✭ 12 (-7.69%)
Mutual labels:  bioinformatics, gwas
snpsea
📊 Identify cell types and pathways affected by genetic risk loci.
Stars: ✭ 26 (+100%)
Mutual labels:  bioinformatics, gwas
adjclust
Adjacency-constrained hierarchical clustering of a similarity matrix
Stars: ✭ 15 (+15.38%)
Mutual labels:  gwas, linkage-disequilibrium
lme4qtl
Mixed models @lme4 + custom covariances + parameter constraints
Stars: ✭ 39 (+200%)
Mutual labels:  gwas, qtl
SumStatsRehab
GWAS summary statistics files QC tool
Stars: ✭ 19 (+46.15%)
Mutual labels:  bioinformatics, gwas
crazydoc
Read DNA sequences from colourful Microsoft Word documents
Stars: ✭ 18 (+38.46%)
Mutual labels:  bioinformatics
motifmatchr
Fast motif matching in R
Stars: ✭ 25 (+92.31%)
Mutual labels:  bioinformatics
chromap
Fast alignment and preprocessing of chromatin profiles
Stars: ✭ 93 (+615.38%)
Mutual labels:  bioinformatics
reg-gen
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
Stars: ✭ 64 (+392.31%)
Mutual labels:  bioinformatics
Introduction to the Unix Shell for biologists
Introduction to the Unix Shell for biologists
Stars: ✭ 16 (+23.08%)
Mutual labels:  bioinformatics
polyRAD
Genotype Calling with Uncertainty from Sequencing Data in Polyploids 🍌🍓🥔🍠🥝
Stars: ✭ 16 (+23.08%)
Mutual labels:  bioinformatics
pathway-mapper
PathwayMapper: An interactive and collaborative graphical curation tool for cancer pathways
Stars: ✭ 47 (+261.54%)
Mutual labels:  bioinformatics
netSmooth
netSmooth: A Network smoothing based method for Single Cell RNA-seq imputation
Stars: ✭ 23 (+76.92%)
Mutual labels:  bioinformatics
calour
exploratory and interactive microbiome analyses based on heatmaps
Stars: ✭ 22 (+69.23%)
Mutual labels:  bioinformatics
epiviz
EpiViz is a scientific information visualization tool for genetic and epigenetic data, used to aid in the exploration and understanding of correlations between various genome features.
Stars: ✭ 65 (+400%)
Mutual labels:  bioinformatics
matam
Mapping-Assisted Targeted-Assembly for Metagenomics
Stars: ✭ 18 (+38.46%)
Mutual labels:  bioinformatics
perbase
Per-base per-nucleotide depth analysis
Stars: ✭ 46 (+253.85%)
Mutual labels:  bioinformatics
react-msa-viewer
React rerelease of MSAViewer
Stars: ✭ 15 (+15.38%)
Mutual labels:  bioinformatics
MSFragger
Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
Stars: ✭ 43 (+230.77%)
Mutual labels:  bioinformatics
awesome-genetics
A curated list of awesome bioinformatics software.
Stars: ✭ 60 (+361.54%)
Mutual labels:  bioinformatics



R build status License: MIT

Author: Brian M. Schilder
README updated: Mar-04-2022

) ) ) ) ))) 🦇 echolocatoR 🦇 ((( ( ( ( (

Automated statistical and functional fine-mapping with extensive access to genome-wide datasets


If you use echolocatoR, please cite:

Brian M Schilder, Jack Humphrey, Towfique Raj (2021) echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline, Bioinformatics; btab658, https://doi.org/10.1093/bioinformatics/btab658

Introduction

Fine-mapping methods are a powerful means of identifying causal variants underlying a given phenotype, but are underutilized due to the technical challenges of implementation. echolocatoR is an R package that automates end-to-end genomics fine-mapping, annotation, and plotting in order to identify the most probable causal variants associated with a given phenotype.

It requires minimal input from users (a GWAS or QTL summary statistics file), and includes a suite of statistical and functional fine-mapping tools. It also includes extensive access to datasets (linkage disequilibrium panels, epigenomic and genome-wide annotations, QTL).

The elimination of data gathering and preprocessing steps enables rapid fine-mapping of many loci in any phenotype, complete with locus-specific publication-ready figure generation. All results are merged into a single per-SNP summary file for additional downstream analysis and results sharing. Therefore echolocatoR drastically reduces the barriers to identifying causal variants by making the entire fine-mapping pipeline rapid, robust and scalable.

echoFlow

Documentation

Website

Getting started

Bugs/requests

Please report any bugs/requests on GitHub Issues.

Contributions are welcome!

Literature

For applications of echolocatoR in the literature, please see:

  1. E Navarro, E Udine, K de Paiva Lopes, M Parks, G Riboldi, BM Schilder…T Raj (2020) Dysregulation of mitochondrial and proteo-lysosomal genes in Parkinson’s disease myeloid cells. Nature Genetics. https://doi.org/10.1101/2020.07.20.212407
  2. BM Schilder, T Raj (2021) Fine-Mapping of Parkinson’s Disease Susceptibility Loci Identifies Putative Causal Variants. Human Molecular Genetics, ddab294, https://doi.org/10.1093/hmg/ddab294
  3. K de Paiva Lopes, G JL Snijders, J Humphrey, A Allan, M Sneeboer, E Navarro, BM Schilder…T Raj (2022) Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature Genetics, https://doi.org/10.1038/s41588-021-00976-y

Installation

General tips

  • We generally recommend users upgrading to R>=4.0.0 before trying to install echolocatoR. While echolocatoR should technically be able to run in R>=3.6.0, some additional challenges with getting dependency versions not to conflict with one another.

Quick installation

In R:

if(!require("remotes")) install.packages("remotes")

remotes::install_github("RajLabMSSM/echolocatoR")

Robust installation (conda)

As with most softwares, installation is half the battle. The easiest way to install all of echolocatoR’s dependencies (which include R, Python, and command line tools) and make sure they play well together is to create a conda environment.

  1. If you haven’t done so already, install conda.

  2. In command line, create the env from the .yml file (this file tells conda what to install): conda env create -f https://github.com/RajLabMSSM/echolocatoR/raw/master/inst/conda/echoR.yml

  3. Activate the new env:
    conda activate echoR

  4. Install echolocatoR from command line so that it installs within the conda env:

  5. Open Rstudio from the command line interface (not by clicking the Rstudio icon). This helps to ensure Rstudio can find the paths to the packages in the conda env:
    open model_celltype_conservation.Rproj

    Alternatively, the conda env also comes with radian, which is a convenient R console that’s much more advanced than the default R console, but doesn’t require access to a GUI. This can be especially useful on computing clusters that don’t support RStudio or other IDEs.
    radian

  6. Finally, to make extra sure echolocatoR uses the packages in this env (esp. if using from RStudio), you can then supply the env name to the finemap_loci() function (and many other echolocatoR functions) using conda_env="echoR".

Clone installation (Rstudio)

Lastly, if you’d like (or if for some reason none of the other installation methods are working for you), you can alternatively clone and then build echolocatoR:

  1. Clone echolocatoR:
    git clone https://github.com/RajLabMSSM/echolocatoR.git
  2. Open echolocatoR.Rproj within the echolocatoR folder.
  3. Then, within Rstudio, build echolocatoR by clicking the following drop down menu items: Build --> Install and Restart (or pressing the keys CMD + SHIFT + B on a Mac).


Dependencies

R

For a full list of required and suggested packages, see DESCRIPTION.

Additionally, there’s some optional R dependencies (e.g. XGR, Rgraphviz) that can be a bit tricky to install, so we’ve removed them as requirements and instead provided a separate R function that helps users to install them afterwards if needed:

library(echolocatoR)
extra_installs()

Python

For a full list of required python packages, see the conda env echoR.yml. But here are some of the key ones.

- python>=3.6.1  
- pandas>=0.25.0   
- pandas-plink  
- pyarrow  
- fastparquet  
- scipy  
- scikit-learn  
- tqdm  
- bitarray  
- networkx  
- rpy2  
- requests  

Command line

tabix
  • Rapid querying of summary stats files.
  • To use it, specify query_by="tabix" in finemap_loci().
  • If you encounter difficulties using a conda distribution of tabix, we recommend you uninstall it from the env and instead install its parent package, htslib as this should be more up to date. htslib is now included in the echoR conda env by default.
  • Alternatively, you may install htslib to your machine globally via Brew (for Mac users) or from source.
bcftools
  • Used here for filtering populations in vcf files.
  • Can be installed via Brew (for Mac users) or conda.
axel
  • Rapid multi-core downloading of large files (e.g. LD matrices from UK Biobank).

  • To use it, specify download_method="axel" in finemap_loci().

  • Update: A conda version of axel has been kindly provided by @jdblischak, no longer requiring a separate installation.

  • However, if you want to use axel without the conda env, see this tutorial for more info on installation. Here’s a quick overview:

    • Mac: Install brew, then: brew install axel
    • CentOS/RHEL 7: yum install epel-release; yum install axel
    • Fedora: yum install axel; dnf install axel
    • Debian Jessie (e.g. Ubuntu, Linux Mint): aptitude install axel

Fine-mapping tools

echolocatoR will automatically check whether you have the necessary columns to run each tool you selected in finemap_loci(finemap_methods=...). It will remove any tools that for which there are missing necessary columns, and produces a message letting you know which columns are missing. Note that some columns (e.g. MAF,N,t-stat) can be automatically inferred if missing.
For easy reference, we list the necessary columns here as well.
See ?finemap_loci() for descriptions of these columns.
All methods require the columns: SNP,CHR,POS,Effect,StdErr

Additional required columns:

ABF

proportion_cases,MAF

FINEMAP

A1,A2,MAF,N

SuSiE

N

PolyFun

A1,A2,P,N

PAINTOR

A1,A2,t-stat

GCTA-COJO

A1,A2,Freq,P,N

coloc

N,MAF


Multi-finemap results files

The main output of echolocatoR are the multi-finemap files (for example, data("BST1")). They are stored in the locus-specific Multi-finemap subfolders.

Column descriptions

  • Standardized GWAS/QTL summary statistics: e.g. SNP,CHR,POS,Effect,StdErr. See ?finemap_loci() for descriptions of each.
  • leadSNP: The designated proxy SNP per locus, which is the SNP with the smallest p-value by default.
  • <tool>.CS: The 95% probability Credible Set (CS) to which a SNP belongs within a given fine-mapping tool’s results. If a SNP is not in any of the tool’s CS, it is assigned NA (or 0 for the purposes of plotting).
  • <tool>.PP: The posterior probability that a SNP is causal for a given GWAS/QTL trait.
  • Support: The total number of fine-mapping tools that include the SNP in its CS.
  • Consensus_SNP: By default, defined as a SNP that is included in the CS of more than N fine-mapping tool(s), i.e. Support>1 (default: N=1).
  • mean.PP: The mean SNP-wise PP across all fine-mapping tools used.
  • mean.CS: If mean PP is greater than the 95% probability threshold (mean.PP>0.95) then mean.CS is 1, else 0. This tends to be a very stringent threshold as it requires a high degree of agreement between fine-mapping tools.

Notes

  • Separate multi-finemap files are generated for each LD reference panel used, which is included in the file name (e.g. UKB_LD.Multi-finemap.tsv.gz).

  • Each fine-mapping tool defines its CS and PP slightly differently, so please refer to the associated original publications for the exact details of how these are calculated (links provided above).


Datasets

For more detailed information about each dataset, use ?:
R library(echolocatoR) ?NOTT_2019.interactome # example dataset

Epigenomic & genome-wide annotations

Nott et al. (2019)

  • Data from this publication contains results from cell type-specific (neurons, oligodendrocytes, astrocytes, microglia, & peripheral myeloid cells) epigenomic assays (H3K27ac, ATAC, H3K4me3) from human brain tissue.

  • For detailed metadata, see:

    data("NOTT_2019.bigwig_metadata")
  • Built-in datasets:

    • Enhancer/promoter coordinates (as GenomicRanges)
    data("NOTT_2019.interactome")
    # Examples of the data nested in "NOTT_2019.interactome" object:
    NOTT_2019.interactome$`Neuronal promoters`
    NOTT_2019.interactome$`Neuronal enhancers`
    NOTT_2019.interactome$`Microglia promoters`
    NOTT_2019.interactome$`Microglia enhancers`
    ...
    ...
    • PLAC-seq enhancer-promoter interactome coordinates
    NOTT_2019.interactome$H3K4me3_around_TSS_annotated_pe
    NOTT_2019.interactome$`Microglia interactome`
    NOTT_2019.interactome$`Neuronal interactome`
    NOTT_2019.interactome$`Oligo interactome`
    ...
    ...
  • API access to full bigWig files on UCSC Genome Browser, which includes

    • Epigenomic reads (as GenomicRanges)
    • Aggregate epigenomic score for each cell type - assay combination

Corces et al. (2020)

  • Data from this preprint contains results from bulk and single-cell chromatin accessibility epigenomic assays in 39 human brains.

    data("CORCES_2020.bulkATACseq_peaks")
    data("CORCES_2020.cicero_coaccessibility")
    data("CORCES_2020.HiChIP_FitHiChIP_loop_calls")
    data("CORCES_2020.scATACseq_celltype_peaks")
    data("CORCES_2020.scATACseq_peaks")

XGR

  • API access to a diverse library of cell type/line-specific epigenomic (e.g. ENCODE) and other genome-wide annotations.

Roadmap

  • API access to cell type-specific epigenomic data.

biomaRt

  • API access to various genome-wide SNP annotations (e.g. missense, nonsynonmous, intronic, enhancer).

HaploR

  • API access to known per-SNP QTL and epigenomic data hits.

QTLs

eQTL Catalogue

  • API access to full summary statistics from many standardized e/s/t-QTL datasets.
  • Data access and colocalization tests facilitated through the catalogueR R package.

Enrichment tools

XGR

  • Binomial enrichment tests between customisable foreground and background SNPs.

GoShifter

  • LD-informed iterative enrichment analysis.

S-LDSC

  • Genome-wide stratified LD score regression.
  • Inlccles 187-annotation baseline model from Gazal et al. 2018.
  • You can alternatively supply a custom annotations matrix.

motifbreakR

  • Identification of transcript factor binding motifs (TFBM) and prediction of SNP disruption to said motifs.
  • Includes a comprehensive list of TFBM databases via MotifDB (9,900+ annotated position frequency matrices from 14 public sources, for multiple organisms).

GARFIELD (under construction)

  • Genomic enrichment with LD-informed heuristics.

LD reference panels

UK Biobank

1000 Genomes Phase 1

1000 Genomes Phase 3


Creator

Brian M. Schilder, Bioinformatician II
Raj Lab
Department of Neuroscience, Icahn School of Medicine at Mount Sinai
Sinai

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].