All Projects → hardingnj → xpclr

hardingnj / xpclr

Licence: MIT license
Code to compute the XP-CLR statistic to infer natural selection

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to xpclr

Genetics
Genetics (Initialization, Selection, Crossover, Mutation)
Stars: ✭ 15 (-76.56%)
Mutual labels:  genetics, selection
dee2
Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
Stars: ✭ 32 (-50%)
Mutual labels:  genetics
cellrank
CellRank for directed single-cell fate mapping
Stars: ✭ 222 (+246.88%)
Mutual labels:  genetics
L.Control.LineStringSelect
Polyline selection control for Leaflet
Stars: ✭ 43 (-32.81%)
Mutual labels:  selection
LDServer
Fast API server for calculating linkage disequilibrium
Stars: ✭ 13 (-79.69%)
Mutual labels:  genetics
flutter-package-selection menu
A flutter widget, highly customizable, to select an item from a list of items.
Stars: ✭ 32 (-50%)
Mutual labels:  selection
Synopsys-Project-2017
A deep learning based bioinformatics project on epigenetics in Type 2 Diabetes.
Stars: ✭ 14 (-78.12%)
Mutual labels:  genetics
pyro-cov
Pyro models of SARS-CoV-2 variants
Stars: ✭ 39 (-39.06%)
Mutual labels:  genetics
manhattan generator
Manhattan plot Generator
Stars: ✭ 20 (-68.75%)
Mutual labels:  genetics
graphsim
R package: Simulate Expression data from igraph network using mvtnorm (CRAN; JOSS)
Stars: ✭ 16 (-75%)
Mutual labels:  genetics
PixelGlitch
Image glitch visualization using various Pixel Sorting methods for Processing
Stars: ✭ 25 (-60.94%)
Mutual labels:  selection
selecton-extension
Selecton provides popup with actions on text selection in all major browsers
Stars: ✭ 36 (-43.75%)
Mutual labels:  selection
GeneLab Data Processing
No description or website provided.
Stars: ✭ 32 (-50%)
Mutual labels:  genetics
genipe
Genome-wide imputation pipeline
Stars: ✭ 28 (-56.25%)
Mutual labels:  genetics
region-plot
A tool to plot significant regions of GWAS
Stars: ✭ 20 (-68.75%)
Mutual labels:  genetics
rvtests
Rare variant test software for next generation sequencing data
Stars: ✭ 114 (+78.13%)
Mutual labels:  genetics
fwdpy11
Forward-time simulation in Python using fwdpp
Stars: ✭ 25 (-60.94%)
Mutual labels:  genetics
Repo-Bio
Binomica Public Repository for Biological Parts
Stars: ✭ 21 (-67.19%)
Mutual labels:  genetics
HumanIdiogramLibrary
Resource of human chromosome schematics & images
Stars: ✭ 76 (+18.75%)
Mutual labels:  genetics
text-editor
A text selection range API written in pure JavaScript, for modern browsers.
Stars: ✭ 24 (-62.5%)
Mutual labels:  selection

XP-CLR

Code to compute xp-clr values as per Chen, Patterson & Reich 2010. This implementation was written due to found bugs in the source of the original tool.

Installation

Clone this git repository into your working directory and cd.

python setup.py install

Also available via conda via the bioconda channel:

conda install xpclr -c bioconda

Use

This is a python module with a convenience script attached. It is designed to run on hdf5 files representing genetic data as generated using scikit-allel. Support is available for VCF and text files in same format as original XPCLR tool, but has not been optimised for this. Support to be added for zarr format soon.

Interface is under development and may change/break in future.

Documentation

usage: xpclr [-h] --out OUT [--format FORMAT] [--input INPUT]
             [--gdistkey GDISTKEY] [--samplesA SAMPLESA] [--samplesB SAMPLESB]
             [--rrate RRATE] [--map MAP] [--popA POPA] [--popB POPB] --chr
             CHROM [--ld LDCUTOFF] [--phased] [--verbose VERBOSE]
             [--maxsnps MAXSNPS] [--minsnps MINSNPS] [--size SIZE]
             [--start START] [--stop STOP] [--step STEP]

Tool to calculate XP-CLR as per Chen, Patterson, Reich 2010

optional arguments:
  -h, --help            show this help message and exit
  --out OUT, -O OUT     output file
  --format FORMAT, -F FORMAT
                        input expected. One of "vcf" (default), "hdf5", or
                        "txt"
  --input INPUT, -I INPUT
                        input file vcf or hdf5
  --gdistkey GDISTKEY   key for genetic position in variants table of hdf5/VCF
  --samplesA SAMPLESA, -Sa SAMPLESA
                        Samples comprising population A. Comma separated list
                        or path to file with each ID on a line
  --samplesB SAMPLESB, -Sb SAMPLESB
                        Samples comprising population B. Comma separated list
                        or path to file with each ID on a line
  --rrate RRATE, -R RRATE
                        recombination rate per base
  --map MAP             input map file as per XPCLR specs
  --popA POPA           filepath to population A genotypes
  --popB POPB           filepath to population A genotypes
  --chr CHROM, -C CHROM
                        Which contig analysis is based on
  --ld LDCUTOFF, -L LDCUTOFF
                        LD cutoff to apply for weighting
  --phased, -P          whether data is phased for more precise r2 calculation
  --verbose VERBOSE, -V VERBOSE
                        How verbose to be in logging. 10=DEBUG, 20=INFO,
                        30=WARN, 40=ERROR, 50=CRITICAL
  --maxsnps MAXSNPS, -M MAXSNPS
                        max SNPs in a window
  --minsnps MINSNPS, -N MINSNPS
                        min SNPs in a window
  --size SIZE           window size in base pairs
  --start START         start base position for windows
  --stop STOP           stop base position for windows
  --step STEP           step size for sliding windows

File formats

hdf5 as generated from vcf by scikit-allel.

Or

vcf via scikit-allel

Or

.geno Space delimited file, containing 0/1s with sample haplotypes as columns, and rows as SNPs.

.map Space delimited file. 6 columns: ID, chromosome, Genetic Distance, Position, REF, ALT.

For examples of these files see the fixture folder used for testing.

Outputs

modelL: The likelihood of the best fitting selection coefficient

nullL: The likelihood of the null model (selection coefficient = 0.0).

sel_coef: The best fitting selection coefficient

nSNPs: Number of SNPs in a window. Suggest to ignore windows where this is small

nSNPs avail: When number of SNPs in a window is greater than the maximum specified. If this is consistently much larger than nSNPs, it suggests your window size is too large.

start, stop: start and stop positions of the windows.

pos_start, pos_stop: actual limits of the used SNPs.

xpclr: The log likelihood ratio of the best fitting model vs the null: 2 * (modelL - nullL)

xpclr_norm: Normalized xpclr ie (xpclr - np.nanmean(xpclr))/np.nanstd(xpclr)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].