All Projects → Illumina → PlatinumGenomes

Illumina / PlatinumGenomes

Licence: other
The Platinum Genomes Truthset

Projects that are alternatives of or similar to PlatinumGenomes

echtvar
echt rapid variant annotation and filtering
Stars: ✭ 72 (+7.46%)
Mutual labels:  variant-analysis
open-cravat
A modular annotation tool for genomic variants
Stars: ✭ 74 (+10.45%)
Mutual labels:  variant-analysis
phenomenet-vp
A phenotype-based tool for variant prioritization in WES and WGS data
Stars: ✭ 31 (-53.73%)
Mutual labels:  variant-analysis
simuG
simuG: a general-purpose genome simulator
Stars: ✭ 68 (+1.49%)
Mutual labels:  variant-analysis
indigo
Indigo: SNV and InDel Discovery in Chromatogram traces obtained from Sanger sequencing of PCR products
Stars: ✭ 26 (-61.19%)
Mutual labels:  indels
witty.er
What is true, thank you, ernestly. A large variant benchmarking tool analogous to hap.py for small variants.
Stars: ✭ 22 (-67.16%)
Mutual labels:  variant-analysis

Platinum Genomes

This repo contains the Platinum Genomes small variant truthset for samples NA12878 (also known as hg001) and NA12877. Platinum Genomes truthset variants were validated using haplotype inheritance information through a well studied 17-member pedigree (CEPH 1463).

Truthsets

Truthsets are made up of a VCF of validated variant records and a BED file of confident regions. These files aren't huge (00s of MB) but are too large to play nicely with git and github, here's a few ways to download:

AWS CLI

Truthset files are stored in an AWS S3 bucket called platinum-genomes, one way to download is via the AWS CLI:

aws s3 cp s3://platinum-genomes/2017-1.0 pg2017 --recursive

To download without AWS credentials, add the --no-sign-request flag. You can also explore the bucket and download individual files with this S3 bucket display.

wget

Alternatively, use wget or similar with the file URIs in this repo, e.g.:

wget -xi files/2017-1.0.files

You can then use the relevant md5 checksum in each release to validate data integrity.

Finally, truthset files can also be downloaded via FTP, e.g.:

wget ftp://platgene_ro:''@ussd-ftp.illumina.com/2017-1.0/hg38/small_variants/NA12878/NA12878.vcf.gz

Usage

To compare a VCF against these truthsets, we recommend using hap.py which performs a sophisticated haplotype comparison rather than a simpler tool such as bcftools isec.

Applications wrapping hap.py and containing these truthsets are available on the following platforms:

Details

See the attached wiki for technical information.

Raw data

Sequencing data for NA12878, NA12877 and samples NA12889-NA12892 (grandparents) are available through the ENA:

BaseSpace users can access this data via a shared BaseSpace project:

Sequencing data for the remaining pedigree members is not consented for public release and so is made available through the dbGaP database:

Issues

Please open an issue for comments, issues and other feedback.

Citation

For further information or to cite Platinum Genomes resources, see:

  • Eberle, MA et al. (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Research, 27:157-164. doi:10.1101/gr.210500.116
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].