All Projects â†’ HadrienG â†’ InSilicoSeq

HadrienG / InSilicoSeq

Licence: MIT license
🚀 A sequencing simulator

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to InSilicoSeq

catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (-52.59%)
Mutual labels:  sequencing, metagenomics
gargammel
gargammel is an ancient DNA simulator
Stars: ✭ 17 (-85.34%)
Mutual labels:  sequencing, metagenomics
covid-19-signal
Files and methodology pertaining to the sequencing and analysis of SARS-CoV-2, causative agent of COVID-19.
Stars: ✭ 31 (-73.28%)
Mutual labels:  sequencing, metagenomics
ATACseq
Analysis Workflow for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq)
Stars: ✭ 51 (-56.03%)
Mutual labels:  sequencing
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (-55.17%)
Mutual labels:  sequencing
MAESTRO
A low Mach number stellar hydrodynamics code
Stars: ✭ 29 (-75%)
Mutual labels:  simulation
GazeboWorldDesigner
A visual tool for laying out Gazebo simulation world files.
Stars: ✭ 12 (-89.66%)
Mutual labels:  simulation
bxtools
Tools for analyzing 10X Genomics data
Stars: ✭ 39 (-66.38%)
Mutual labels:  sequencing
MetacommunityDynamics.jl
a julia libarary for simulating the dynamics of ecological communities across space
Stars: ✭ 14 (-87.93%)
Mutual labels:  simulation
js-simulator
General-purpose discrete-event multiagent simulation library for agent-based modelling and simulation
Stars: ✭ 52 (-55.17%)
Mutual labels:  simulation
ad-xolib
C++ library for Parsing OpenScenario (1.1.1) & OpenDrive files (1.7) ASAM Specifications
Stars: ✭ 56 (-51.72%)
Mutual labels:  simulation
Incoherent-Light-Simulation
Simulation of the propagation of incoherent light, aiming to illustrate the concept of spatial coherence.
Stars: ✭ 98 (-15.52%)
Mutual labels:  simulation
awesome-edge-computing
A curated list of awesome edge computing, including Frameworks, Simulators, Tools, etc.
Stars: ✭ 149 (+28.45%)
Mutual labels:  simulation
MG-RAST
The MG-RAST Backend -- the API server
Stars: ✭ 39 (-66.38%)
Mutual labels:  metagenomics
AMBER
AMBER: Assessment of Metagenome BinnERs
Stars: ✭ 18 (-84.48%)
Mutual labels:  metagenomics
astarix
AStarix: Fast and Optimal Sequence-to-Graph Aligner
Stars: ✭ 60 (-48.28%)
Mutual labels:  sequencing
plc-programmable-3d-simulation
Project for students who want to learn PLC programming but don't have access to real-world machines or constructions to learn programming on.
Stars: ✭ 49 (-57.76%)
Mutual labels:  simulation
UCThello
UCThello - a board game demonstrator (Othello variant) with computer AI using Monte Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)
Stars: ✭ 26 (-77.59%)
Mutual labels:  simulation
GameOfLife
Conway's Game of Life
Stars: ✭ 18 (-84.48%)
Mutual labels:  simulation
woss-ns3
WOSS is a multi-threaded C++ framework that permits the integration of any existing underwater channel simulator that expects environmental data as input and provides as output a channel realization. Currently, WOSS integrates the Bellhop ray-tracing program. Thanks to its automation the user only has to specify the location in the world and the…
Stars: ✭ 20 (-82.76%)
Mutual labels:  simulation

InSilicoSeq

A sequencing simulator

Build Status Documentation Status PyPI version codecov doi LICENSE

InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome.

InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data.

InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.

Installation

Insilicoseq is Available in bioconda.

To install with conda:

conda install -c bioconda insilicoseq

Or with pip:

pip install InSilicoSeq

Note: Insilicoseq requires python >= 3.5

Alternatively, with docker:

docker pull hadrieng/insilicoseq:latest

For more installation options, please refer to the full documentation

Usage

InSilicoSeq has two subcommands: iss generate to generate Illumina reads and iss model to create an error model from which the reads will take their characteristics.

InSilicoSeq comes with pre-computed error models that should be sufficient for most use cases.

Generate reads with a pre-computed error model

for generating 1 million reads modelling a MiSeq instrument:

curl -O -J -L https://osf.io/thser/download  # download the example data
iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads

where genomes.fasta should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.

InSilicoSeq comes with 3 error models: MiSeq, HiSeq and NovaSeq.

If you have built your own model, pass the .npz file to the --model argument to simulate reads from your own error model.

For 10 million reads and a custom error model:

curl -O -J -L https://osf.io/thser/download  # download the example data
iss generate -g SRS121011.fasta -n 10m --model my_model.npz --output /path/to/my_reads

granted you have built my_model.npz with iss model (see below)

For more examples and a full list of options, please refer to the full documentation

Generate reads without input genomes

We can download some for you! InSilicoSeq can download random genomes from the ncbi using the infamous eutils

The command

iss generate --ncbi bacteria -u 10 --model MiSeq --output ncbi_reads

will generate 1 million reads from 10 random bacterial genomes.

For more examples and a full list of options, please refer to the full documentation

Create your own error model

If you do not wish to use the pre-computed error models provided with InSilicoSeq, it is possible to create your own.

Say you have a reference metagenome called genomes.fasta, and read pairs reads_R1.fastq.gz and reads_R2.fastq.gz

Align you reads against the reference:

bowtie2-build genomes.fasta genomes
bowtie2 -x genomes -1 reads_R1.fastq.gz -2 reads_R2.fastq.gz | \
samtools view -bS | samtools sort -o genomes.bam
samtools index genomes.bam

then build the model:

iss model -b genomes.bam -o genomes

which will create a genome.npz file containing your newly built model

License

Code is under the MIT license.

Issues

Found a bug or have a question? Please open an issue

Contributing

We welcome contributions from the community! See our Contributing guidelines

Citation

If you use our software, please cite us!

Gourlé H, Karlsson-Lindsjö O, Hayer J and Bongcam+Rudloff E, Simulating Illumina data with InSilicoSeq. Bioinformatics (2018) doi:10.1093/bioinformatics/bty630

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].