All Projects → kundajelab → ENCODE_downloader

kundajelab / ENCODE_downloader

Licence: BSD-3-Clause license
Downloader for ENCODE

Programming Languages

python
139335 projects - #7 most used programming language

ENCODE downloader

This Python script downloads data files from the ENCODE portal.

Supported data types

You can download any data type of data from the portal. For example to download all FASTQs and unfiltered BAMs:

$ python encode_downloader.py --file-types fastq "bam:unfiltered alignments" ...

For example to download all p-value signal bigwigs:

$ python encode_downloader.py --file-types "bigWig:signal p-value" ...

Authentication

To download unpublished files visible to sumitters only, you need to have authentication information from the ENCODE portal. Get your access key ID and secret key from the portal homepage menu YourID->Profile->Add Access key.

$ python encode_downloader.py --encode-access-key-id [ENCODE_ACCESS_KEY_ID] --encode-secret-key [ENCODE_SECRET_KEY] ...

Usage

$ python encode_downloader.py -h

Generating BDS pipeline script

After you download data files you need to process them with pipelines. generate_pipeline_run_sh.py generates a shell script run_pipelines.sh to run Kundaje lab's BDS pipelines.

[EXP_ACC_IDS_TXT] is a text file with [experiment_accession_id] in each line. Specify root directory of experiment data files [EXP_DATA_ROOT_DIR] that you downloaded from ENCODE portal by encode_downloader.py.

To get a shell script to run pipelines without controls:

$ python generate_pipeline_run_sh.py --exp-acc-ids-file [EXP_ACC_IDS_TXT] --exp-data-root-dir [EXP_DATA_ROOT_DIR] --pipeline-bds-script [BDS_FILE_PATH; {chipsqe.bds, atac.bds}] --file-type-to-run-pipeline [FILE_TYPE; {fastq,bam,filt_bam}]

To get a shell script To run pipelines with controls. [CTL_ACC_IDS_TXT] is a TSV file with [experiment_accession_id]\t[control_accession_id]. Specify root directory of control data files [CTL_DATA_ROOT_DIR] that you downloaded from ENCODE portal by encode_downloader.py.

$ python generate_pipeline_run_sh.py --exp-acc-ids-file [EXP_ACC_IDS_TXT] --exp-data-root-dir [EXP_DATA_ROOT_DIR] --exp-id-to-ctl-id-file exp_to_ctl.txt --ctl-data-root-dir [CTL_DATA_ROOT_DIR] --pipeline-bds-script [BDS_FILE_PATH; chipsqe.bds or atac.bds] --file-type-to-run-pipeline [FILE_TYPE; {fastq,bam,filt_bam}]

Requirements

  • Python requests
$ pip install requests

Examples

Using search query URL

$ python encode_downloader.py "https://www.encodeproject.org/search/?type=Experiment&assay_title=ATAC-seq&replicates.library.biosample.life_stage=postnatal&status=released" --ignore-unpublished --file-types fastq "bam:unfiltered alignments"

Using experiment URL

$ python encode_downloader.py "https://www.encodeproject.org/experiments/ENCSR000ELE" --ignore-unpublished --encode-access-key-id [ENCODE_KEY_ID] --encode-secret-key [ENCODE_PASSWORD]

Using accession ids file

$ python encode_downloader.py acc_ids.txt --ignore-unpublished --encode-access-key-id [ENCODE_KEY_ID] --encode-secret-key [ENCODE_PASSWORD] --file-types "bigWig:signal p-value"

Using accession id and ids file (mixed)

$ python encode_downloader.py acc_ids.txt ENCSR000ELE --ignore-unpublished --encode-access-key-id [ENCODE_KEY_ID] --encode-secret-key [ENCODE_PASSWORD]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].