All Projects → pb-jlandolin → PacbioToSRA

pb-jlandolin / PacbioToSRA

Licence: other
Take a list of Pacbio files (.fofn) and creates a spreadsheet for data submission to the sequuence read archive (SRA)

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
PureBasic
71 projects

#PacbioToSRA This repo contains scripts, instructions, and examples on preparing PacBio sequence data for data submission to the SRA.

Instructions

  1. Register project and samples
  2. Setup script's environment
  3. Run the script
  4. Update spreadsheet and email it to NCBI

##Step 1. Register project and samples

Go to https://submit.ncbi.nlm.nih.gov/ and register your Bioproject
Go to https://submit.ncbi.nlm.nih.gov/ and register your Biosample

##Step 2. Prepare script's environment ####Setup virtual environment:

(go to the root directory of this repo)
$ virtualenv virtualenv_PacbioToSRA
$ source virtualenv_PacbioToSRA/bin/activate
$ pip install -r requirements.txt

##Step 3. Run the script ####Usage:

$ bin/pacb_ncbi --help
Usage: pacb_ncbi [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  calc_upload_size              Calculates the total size of the data that...
  create_excel_file             Creates the Excel file that contains the...
  create_excel_file_and_upload  Creates the Excel file that contains the...
  upload                        Uploads the datasets in the input.fofn file...

####Example:

$ bin/pacb_ncbi create_excel_file_and_upload -i /path/to/input.fofn -p bioproject1 -s biosample1 -x my_sra.xlsx -u ncbi_username -k /path/to/ssh/file

####Notes:

  • NCBI_USERNAME is the username that ncbi assigns to your institution (get this from [email protected])
  • SSH _KEY _FILE is the ssh key file that you generated for your institution (http://www.ncbi.nlm.nih.gov/books/NBK180157/)
  • To get additional help, type:
    $ bin/pacb_ncbi <subcommand> --help
    ex:
    $ bin/pacb_ncbi create_excel_file_and_upload --help
    $ bin/pacb_ncbi create_excel_file --help
    $ bin/pacb_ncbi upload --help

##Step 4. Update spreadsheet and email it to NCBI

  • Update the spreadsheet with any additional information. The following columns must be filled in:
    • "SRA_data" sheet
      • title/short description
      • library_strategy
      • library_source
      • library_selection
      • library_layout
  • Email ([email protected]) and attach the spreadsheet to the email.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].