All Projects → BGI-shenzhen → VCF2Dis

BGI-shenzhen / VCF2Dis

Licence: MIT license
VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

Programming Languages

C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to VCF2Dis

locator
deep learning prediction of geographic location from individual genome sequences
Stars: ✭ 26 (-51.85%)
Mutual labels:  population-genetics
himu
Front-end project
Stars: ✭ 13 (-75.93%)
Mutual labels:  boostrap
SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (+50%)
Mutual labels:  population-genetics
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+777.78%)
Mutual labels:  population
iSAFE
Pinpoints the mutation favored by selection
Stars: ✭ 24 (-55.56%)
Mutual labels:  population-genetics
covid19 scenarios data
Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
Stars: ✭ 43 (-20.37%)
Mutual labels:  population
admixr
An R package for reproducible and automated ADMIXTOOLS analyses
Stars: ✭ 20 (-62.96%)
Mutual labels:  population-genetics
angsd-wrapper
Utilities for analyzing next generation sequencing data.
Stars: ✭ 13 (-75.93%)
Mutual labels:  population-genetics
psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Stars: ✭ 121 (+124.07%)
Mutual labels:  population-genetics
fwdpy11
Forward-time simulation in Python using fwdpp
Stars: ✭ 25 (-53.7%)
Mutual labels:  population-genetics
mongoose-schema-jsonschema
Mongoose extension that allows to build json schema for mongoose models, schemes and queries
Stars: ✭ 88 (+62.96%)
Mutual labels:  population
PopLDdecay
PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format(VCF) files
Stars: ✭ 135 (+150%)
Mutual labels:  population
PopED
Population Experimental Design (PopED) in R
Stars: ✭ 27 (-50%)
Mutual labels:  population
genstar
Generation of Synthetic Populations Library
Stars: ✭ 17 (-68.52%)
Mutual labels:  population
poppr
🌶 An R package for genetic analysis of populations with mixed (clonal/sexual) reproduction
Stars: ✭ 57 (+5.56%)
Mutual labels:  population-genetics
genesis
A library for working with phylogenetic and population genetic data.
Stars: ✭ 43 (-20.37%)
Mutual labels:  population-genetics
TSP-GA
Traveling Salesman Problem Using Parallel Genetic Algorithms
Stars: ✭ 29 (-46.3%)
Mutual labels:  population
mitoc-trips
The MIT Outing Club's trip management system
Stars: ✭ 30 (-44.44%)
Mutual labels:  boostrap
spring-boot-shop-sample
My first web application using Spring Boot framework.
Stars: ✭ 66 (+22.22%)
Mutual labels:  boostrap
Genetics
Genetics (Initialization, Selection, Crossover, Mutation)
Stars: ✭ 15 (-72.22%)
Mutual labels:  population

VCF2Dis

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format

1) Install


The new version will be updated and maintained in hewm2008/VCF2Dis, please click below Link to download the latest version

hewm2008/VCF2Dis

Download


Just [make] or [sh make.sh ] to compile this software.the final software can be found in the Dir [bin/VCF2Dis]
For linux /Unix and macOS

        tar -zxvf  VCF2DisXXX.tar.gz             # if Link do not work ,Try re-install [zlib]library
        cd VCF2DisXXX;                           # [zlib] and copy them to the library Dir
        make ; make clean                        # VCF2Dis-xx/src/include/zlib
        ./bin/VCF2Dis
  

Note: If fail to link,try to re-install the libraries zlib

2) an Example of nj-tree with no boostrap


    1. Parameter description:
	Usage: VCF2Dis -InPut  <in.vcf>  -OutPut  <p_dis.mat>

		-InPut     <str>     Input one or muti GATK VCF genotype File
		-OutPut    <str>     OutPut Sample p-Distance matrix

		-InList    <str>     Input GATK muti-chr VCF Path List
		-SubPop    <str>     SubGroup SampleList of VCFFile [ALLsample]
		-Rand      <float>   Probability (0-1] for each site to join Calculation [1]
		-KeepMF              Keep the Middle File diff & Use matrix

		-help                Show more help [hewm2008 v1.47]
    1. To Create the p_distance matrix
# 2.1) To new all the sample p_distance matrix based VCF, run VCF2Dis directly
      ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis.mat
      #  ./bin/VCF2Dis     -InPut  in.fa.gz	-OutPut p_dis.mat -InFormat  FA

# 2.2) To new sub group sample p_distance matrix ; put their sample name into File sample.list
      ./bin/VCF2Dis	-InPut	chr1.vcf.gz chr2.vcf.gz	-OutPut p_dis.mat  -SubPop  sample.list
    1. construct nj-tree and present it (need deal with Other software)

method 1

Choose one of A/B
A. Upload the web fneighbor(http://emboss.toulouse.inra.fr/cgi-bin/emboss/fneighbor?_pref_hide_optional=1) ,the Click the Run fneighbor bottom . then you can get the output file datafile.treefile
B. Upload the p_dis.mat to the website fastme (http://www.atgc-montpellier.fr/fastme/), select Data Type to the Distance matrix ,Click the bottom twist execute & email results. you will get the p_dis_mat_fastme-tree.nwk , and Email not mandatory;

Run MEGA # The MEGA (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file [p_dis_mat_fastme-tree.nwk]

method 2

Use the PHYLIPNEW to construct nj-tree
How to Install PHYLIPNEW please Click on here or Click on here(Chinese)

      #    3.1 Run  PHYLIP
      #   After p_distance done , software PHYLIPNEW 3.69 (http://evolution.genetics.washington.edu/phylip.html) ,with neighbor-joining method can was used to construct the phylogenetic tree on the basis of this  p_distance matrix;
       
           PHYLIPNEW-3.69.650/bin/fneighbor  -datafile p_dis.matrix  -outfile tree.out1.txt -matrixtype s -treetype n -outtreefile tree.out2.tre

      #    3.2 Run  MEGA
      #    The MEGA6 (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file [tree.out2.tre]
    1. you can see the neighbor-joining tree and save it as PDF format

3) an Example of nj-tree with boostrap

    1. muti-run the nj-tree by using put back sampling. To using the the part of the sites and new the nj-tree as above. Repeat For the NN times. X=(1,2....NN);
    ./bin/VCF2Dis	-InPut	in.vcf.gz	-OutPut p_dis_X.mat    -Rand     0.25
    PHYLIPNEW-3.69.650/bin/fneighbor  -datafile p_dis_X.matrix  -outfile tree.out1_X.txt -matrixtype s -treetype n -outtreefile tree.out2_X.tre 
    1. merge the all the put back sampling NJ-tree and construct boostrap nj-tree.
	cat   tree.out2_*.tre   >  ALLtree_merge.tre
	PHYLIPNEW-3.69.650/bin/fconsense   -intreefile   ALLtree_merge.tre  -outfile out  -treeprint Y
	perl  ./bin/percentageboostrapTree.pl    ALLtree_merge.treefile    NN    Final_boostrap.tre
    1. construct nj-tree and present it (need deal with Other software)
      #    The MEGA6 (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file Final_boostrap.tre] 

4) Introduction


To new the p_distance matrix besed the VCF file. the more infomation about the p_distance matrix, see this website. The VCF SNPs datasets were used to calculate p-distance between individuals, according to the follow formula to operate the sample i and sample j genetic distance:

            D_ij=(1/L) * [(sum(d(l)_ij))]


Where L is the length of regions where SNPs can be identified, and given the alleles at position l are A/C:

            d(l)_ij=0.0     if the genotypes of the two individuals were AA and AA;
            d(l)_ij=0.5     if the genotypes of the two individuals were AA and AC;
            d(l)_ij=0.0     if the genotypes of the two individuals were AC and AC;
            d(l)_ij=1.0     if the genotypes of the two individuals were AA and CC;
            d(l)_ij=0.0     if the genotypes of the two individuals were CC and CC;

5) Results


some NJ-tree images which I draw in the paper before.

6) Discussing


######################swimming in the sky and flying in the sea ########################### ##

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].