All Projects β†’ mbhall88 β†’ taeper

mbhall88 / taeper

Licence: MIT license
A small python program to simulate a real-time Nanopore sequencing run based on a previous experiment.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to taeper

nanoflow
πŸ”¬ De novo assembly of nanopore reads using nextflow
Stars: ✭ 20 (+11.11%)
Mutual labels:  nanopore
streamformatics
Real-time species-typing visualisation for nanopore data.
Stars: ✭ 13 (-27.78%)
Mutual labels:  nanopore
pipeline-structural-variation
Pipeline for calling structural variations in whole genomes sequencing Oxford Nanopore data
Stars: ✭ 104 (+477.78%)
Mutual labels:  nanopore
Winnowmap
Long read / genome alignment software
Stars: ✭ 151 (+738.89%)
Mutual labels:  nanopore
fast-sg
Fast-SG: An alignment-free algorithm for ultrafast scaffolding graph construction from short or long reads.
Stars: ✭ 22 (+22.22%)
Mutual labels:  nanopore
rerio
Research release basecalling models and configurations
Stars: ✭ 60 (+233.33%)
Mutual labels:  nanopore
nanoseq
Nanopore demultiplexing, QC and alignment pipeline
Stars: ✭ 82 (+355.56%)
Mutual labels:  nanopore
poreCov
SARS-CoV-2 workflow for nanopore sequence data
Stars: ✭ 34 (+88.89%)
Mutual labels:  nanopore
tiptoft
Predict plasmids from uncorrected long read data
Stars: ✭ 27 (+50%)
Mutual labels:  nanopore
pepper
PEPPER-Margin-DeepVariant
Stars: ✭ 179 (+894.44%)
Mutual labels:  nanopore
IsoQuant
Reference-based transcript discovery from long RNA read
Stars: ✭ 26 (+44.44%)
Mutual labels:  nanopore
rkmh
Classify sequencing reads using MinHash.
Stars: ✭ 42 (+133.33%)
Mutual labels:  nanopore
poreplex
A versatile sequenced read processor for nanopore direct RNA sequencing
Stars: ✭ 74 (+311.11%)
Mutual labels:  nanopore
RATTLE
Reference-free reconstruction and error correction of transcriptomes from Nanopore long-read sequencing
Stars: ✭ 35 (+94.44%)
Mutual labels:  nanopore
haslr
A fast tool for hybrid genome assembly of long and short reads
Stars: ✭ 68 (+277.78%)
Mutual labels:  nanopore
vbz compression
VBZ compression plugin for nanopore signal data
Stars: ✭ 31 (+72.22%)
Mutual labels:  nanopore
recentrifuge
Recentrifuge: robust comparative analysis and contamination removal for metagenomics
Stars: ✭ 79 (+338.89%)
Mutual labels:  nanopore
Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+561.11%)
Mutual labels:  nanopore
wengan
An accurate and ultra-fast hybrid genome assembler
Stars: ✭ 81 (+350%)
Mutual labels:  nanopore
awesome-nanopore
A curated list of awesome nanopore analysis tools.
Stars: ✭ 100 (+455.56%)
Mutual labels:  nanopore

taeper

Simulate repeating a nanopore experiment.

https://img.shields.io/twitter/follow/mbhall88.svg?style=social&logo=twitter&label=Follow

This tool is designed for anyone developing tools and applications for real-time analysis of Oxford Nanopore sequencing data. The use is quite simple. Given a directory of fast5 files, A, and a destination directory, B, this tool will copy the files from A to B in the same order and timing as they were deposited into the reads folder during the actual experiment. It will also maintain the current directory structure.

I know what you’re thinking: β€œBut who wants to hang around for 30 hours waiting for a simulation to finish?” Luckily there is an optional scaling factor that will speed up the process (--scale).

Installation

This is a python3 only package.

To install, simply run

pip3 install taeper
taeper --help

Usage

taeper is designed to simulate the order and timing of fast5 files that were produced in a minION run. You give it an input directory and it will gather the names of all the fast5 files under that directory (including sub-directories). It gathers information about the time when each read finished sequencing and creates a sorted index of all the files. In this index the first file was the first one sequenced and so on. Attached to each file path is a delay time, t in seconds. This specifies that that read completed sequencing t seconds after the one before it. In this way taeper can rerun what the experiment looked like in terms of the depositing of fast5 files. It then moves those files into a specified output directory and will recreate any subdirectory structures (e.g pass or fail folders).

taeper --input_dir path/to/reads --output some/place

This will copy all fast5 files in path/to/reads to some/place in the exact same timing as they were produced.

In reality though you probably dont want to wait the full length of time that would take. In that case you can use the scale option.

taeper --input_dir path/to/reads --output some/place --scale 100

This will rerun the experiment 100 times faster.

Indexing is the longest step of the process and therefore, by default, an index file of the file order with the time delays is stored in a file called taeper_index.npy. Keep in mind that the file paths in the index are relative to the working directory it was generated in.

If you would just like to index but not copy you can do

taeper --input_dir path/to/reads --dump_index experiment_index.npy

You just omit the output directory. --dump_index also allows you to specify a name other than the default for the index.

If you already have an index file and you would like to rerun the experiment then you can provide that index and skip to the copying

taeper --input_dir path/to/reads --output some/place --index experiment_index.npy --scale 100

Full usage

taeper --help
usage: taeper [-h] -i INPUT_DIR [--index INDEX] [-o OUTPUT] [--scale SCALE]
          [-d DUMP_INDEX] [--no_index] [--log_level {0,1,2,3,4,5}]
          [--no_progress_bar]

Simulate the real-time depositing of Nanopore reads into a given folder,
conserving the order they were processed during sequencing. If pass and fail
folders do not exist in output_dir they will be created if detected in the
file path for the fast5 file.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_DIR, --input_dir INPUT_DIR
                        Directory where files are located.
  --index INDEX         Provide a prebuilt index file to skip indexing. Be
                        aware that paths within an index file are relative to
                        the current working directory when they were built.
  -o OUTPUT, --output OUTPUT
                        Directory to copy the files to. If not specified, will
                        generate the index file only.
  --scale SCALE         Amount to scale the timing by. i.e scale of 10 will
                        deposit the reads 10x fatser than they were generated.
                        (Default = 1.0)
  -d DUMP_INDEX, --dump_index DUMP_INDEX
                        Path to save index as. Default is 'taeper_index.npy'
                        in current working directory. Note: Paths in the index
                        are relative to the current working directory.
  --no_index            Dont write the index list to file. This will mean it
                        needs regenerating for this dataset on each run.
  --log_level {0,1,2,3,4,5}
                        Level of logging. 0 is none, 5 is for debugging.
                        Default is 4 which will report info, warnings, errors,
                        and critical information.
  --no_progress_bar     Do not display progress bar.

Disclaimer

The fast5 file structure has changed a bit over time and as such not all files will work. Although, I have tested this program with most recent forms and it works fine. A logging warning will show up on the console if taeper is unable to read a file or determine it's finish time.


  • Free software: MIT license

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].