All Projects → akiomiyao → Tif

akiomiyao / Tif

Transposon Insertion Finder - Detection of new insertions in NGS data

Programming Languages

perl
6916 projects

Labels

Projects that are alternatives of or similar to Tif

atropos
An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Stars: ✭ 109 (+1111.11%)
Mutual labels:  ngs
gencore
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Stars: ✭ 91 (+911.11%)
Mutual labels:  ngs
Htslib
C library for high-throughput sequencing data formats
Stars: ✭ 529 (+5777.78%)
Mutual labels:  ngs
SVCollector
Method to optimally select samples for validation and resequencing
Stars: ✭ 20 (+122.22%)
Mutual labels:  ngs
fastq utils
Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.
Stars: ✭ 25 (+177.78%)
Mutual labels:  ngs
ctdna-pipeline
A simplified pipeline for ctDNA sequencing data analysis
Stars: ✭ 29 (+222.22%)
Mutual labels:  ngs
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (+188.89%)
Mutual labels:  ngs
Manorm
A robust model for quantitative comparison of ChIP-Seq data sets.
Stars: ✭ 16 (+77.78%)
Mutual labels:  ngs
peppy
Project metadata manager for PEPs in Python
Stars: ✭ 29 (+222.22%)
Mutual labels:  ngs
Deeptools
Tools to process and analyze deep sequencing data.
Stars: ✭ 448 (+4877.78%)
Mutual labels:  ngs
iSkyLIMS
is an open-source LIMS (laboratory Information Management System) for Next Generation Sequencing sample management, statistics and reports, and bioinformatics analysis service management.
Stars: ✭ 33 (+266.67%)
Mutual labels:  ngs
reg-gen
Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.
Stars: ✭ 64 (+611.11%)
Mutual labels:  ngs
platon
Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
Stars: ✭ 52 (+477.78%)
Mutual labels:  ngs
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (+511.11%)
Mutual labels:  ngs
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+8922.22%)
Mutual labels:  ngs
CONSENT
Scalable long read self-correction and assembly polishing with multiple sequence alignment
Stars: ✭ 47 (+422.22%)
Mutual labels:  ngs
DNAscan
DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.
Stars: ✭ 36 (+300%)
Mutual labels:  ngs
Fusiondirect.jl
(No maintenance) Detect gene fusion directly from raw fastq files
Stars: ✭ 23 (+155.56%)
Mutual labels:  ngs
Ngsdist
Estimation of pairwise distances under a probabilistic framework
Stars: ✭ 6 (-33.33%)
Mutual labels:  ngs
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (+3377.78%)
Mutual labels:  ngs

TIF

Transposon Insertion Finder

Transposon Insertion Finder (TIF) is a search program to detect insertions of transposable element from short reads of next generation sequencer. The program is written in a perl script. The program runs on Unix (Linux) platform. The TIF requires short sequences of both ends of target transposable element and length of target site duplication (TSD). Basic TIF (tif_basic.pl) does not require reference genome sequence to select short reads containing target sites, whereas extended TIF (tif_extended require the reference sequence and BLAST program.

TIF is one of the fastest and the smallest program among analysis programs of next generation sequencing (NGS). The distinctive feature of TIF is direct selection containing end sequences of the target transposable element from short reads of NGS.

Update

  • New script tif_flanking.pl is implemented. (2019-03-21)
    tif_flanking is update of tif_basic.pl.
    If you do not have reference genome sequnce, try tif_flanking.pl.
    tif_flanking outputs flanking sequence of transposon insertion in fasta format.
    Run without argument, help will be shown.

  • New script tif.pl is implemented. (2019-03-19)

    e.g. perl tif.pl ref.fasta TGTTAAATATATATACA TTGCAAGTTAGTTAAGA
    First argument is the path of reference sequence with multi-fasta format.
    Second argument is the head sequence of transposon.
    Third argument is the tail sequence of transposon.
    All short reads (e.g name_r1.fastq, name_r2.fastq) in './read' directory will be analyzed.
    Run without argument, help will be shown.

    This version does not depend on BLAST search. Search script was included in tif.pl.
    tip.pl is upward compatible with tif2.pl.

  • TIF is a powerful tool.
    Sensitive detection of pre-integration intermediates of long terminal repeat retrotransposons in crop plants
    Jungnam Cho, Matthias Benoit, Marco Catoni, Hajk-Georg Drost, Anna Brestovitsky, Matthijs Oosterbeek, Jerzy Paszkowski
    Nature Plants, 5:26–33 (2019)
    https://doi.org/10.1038/s41477-018-0320-9

    Mobilization of Pack-CACTA transposons in Arabidopsis suggests the mechanism of gene shuffling
    Marco Catoni, Thomas Jonesman, Elisa Cerruti, Jerzy Paszkowski
    Nucleic Acids Research, 47(3):1311–1320 (2019)
    https://doi.org/10.1093/nar/gky1196

Download TIF

Download zip file of PED from https://github.com/akiomiyao/tif and extract.

or

% git clone https://github.com/akiomiyao/tif.git

If you got scripts from github, update to newest version is very easy using pull command of git.

% git pull

Static data required by TIF (for demonstration)

For example,
% perl tif.pl IRGSP-1.0_genome.fasta TGTTAAATATATATACA TTGCAAGTTAGTTAAGA

or

% perl tif.pl TAIR10_chr_all.fas GAGGGATCATCTCTTGTGTC GACTGGCCAGACGATTATTC

or

% perl tif.pl dmel-all-chromosome-r6.29.fasta CATGATGAAATAACAT ATGTTATTTCATCATG

Before run tif.pl, download fastq file in read directory.

Result will be saved to tif.result file.

The tif.pl is easy to use and has high sensitivity rather than old programs.
But old programs described below, tif_basic.pl and blast.pl (Algorithm 1) and tif_extended (Algorithm 2) are faster than tif.pl.
Old programs do not analyze complementary sequences of fastq.
The function of tif_flanking.pl is same as tif_basic.pl, but tif_flanking.pl analyzes
complementary sequences from fastq. It means that tif_flanking is more sensitive but slow.

For Tos17 retrotransposon of rice

  Head of Tos17: TGTTAAATATATATACA
  Tail of Tos17: TTGCAAGTTAGTTAAGA
  Size of TSD: 5
  fastq: SRR556173 SRR556174 SRR556175
  reference: https://rapdb.dna.affrc.go.jp/download/archive/irgsp1/IRGSP-1.0_genome.fasta.gz

For mPing transposon of rice (DNA type transposon)

  Head of mPing: GGCCAGTCACAATGGGG
  Tail of mPing: AGCCATTGTGACTGGCC
  Size of TSD: 3
  Because TSD of mPing is too short, extended TIF is recommended. 

For nDart transposon of rice (DNA type transposon)

  Head of nDart: TAGAGGTGGCCAAACGGGC
  Tail of nDart: GCCCGTTTGGCCACCTCTA
  Size of TSD: 8

For P-element of Drosophila melanogaster

  Head of P-element: CATGATGAAATAACAT
  Tail of P-element: ATGTTATTTCATCATG
  Size of TSD: 8
  fastq: SRR823377 SRR823382
  reference: ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.29_FB2019_04/fasta/dmel-all-chromosome-r6.29.fasta.gz

For Hi of Arabidopsis thaliana

  Head of Hi: GAGGGATCATCTCTTGTGTC
  Tail of Hi: GACTGGCCAGACGATTATTC
  Size of TSD: 9
  doi: 10.1038/emboj.2013.169
  fastq: DRR001193 (ddm1 mutant)
  reference: https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas

To obtain short read data

Download sra tool kit from

  https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

In your home directory,

  tar xvfz sratoolkit.2.9.6-centos_linux64.tar.gz
  copy fastq-dump in bin directory to executable directory.

For ttm2 (Rice mutant)

    cd tif/read
    fastq-dump --split-files -A SRR556173

For ttm5 (Rice mutant)

    cd tif/read
    fastq-dump --split-files -A SRR556174
    fastq-dump --split-files -A SRR556175

For D. melanogaster

    cd tif/read
    fastq-dump --split-files -A SRR823377
    fastq-dump --split-files -A SRR823382

BLAST programs

Download BLAST programs

  ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.9.0+-x64-linux.tar.gz

New version of BLAST can be downloaded from https://ftp.ncbi.nlm.nih.gov/blast/executables/

Copy blastn and makeblastdb to executable directory.
cp ncbi-blast-2.9.0+/bin/blastn ~/bin
cp ncbi-blast-2.9.0+/bin/makeblastdb ~/bin

To make blast data base

  makeblastdb -in reference_genome.fasta -dbtype nucl

For Rice

  cd tif
  wget http://rapdb.dna.affrc.go.jp/download/archive/irgsp1/IRGSP-1.0_genome.fasta.gz
  gzip -d IRGSP-1.0_genome.fasta.gz
  makeblastdb -in IRGSP-1.0_genome.fasta -dbtype nucl

For Drosophira melanogaster

  cd tif
  wget ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.29_FB2019_04/fasta/dmel-all-chromosome-r6.29.fasta.gz
  gzip -d dmel-all-chromosome-r6.29.fasta.gz
  makeblastdb -in dmel-all-chromosome-r6.29.fasta -dbtype nucl 

Search targets of transposon

Run without any arguments, help message is shown.

Save fastq files in read directory.

 cd tif
 cp somewhere/foo.fastq read

To test TIF algorithm 1

  cd tif
  perl tif_basic.pl head_sequence tail_sequence TSD_size
  perl blast.pl blatdb_name

  For example,
  cd tif
  perl tif_basic.pl TGTTAAATATATATACA TTGCAAGTTAGTTAAGA 5
  perl blast.pl IRGSP-1.0_genome.fasta

Output of tif_basic.pl is tif.fasta, a multiple FASTA file.

The blast.pl reads tif.fasta and returns tif.position containing location and direction of TE insertion sites.

To test TIF algorithm 2

  cd tif
  perl tif_extended.pl reference_fasta_file head_sequence tail_sequence

  For example,
  cd tif
  perl tif_extended.pl IRGSP-1.0_genome.fasta TGTTAAATATATATACA TTGCAAGTTAGTTAAGA

The tif_extended.pl returns both tif.fasta and tif.position files.

For new extended tif

  cd tif
  perl tif.pl IRGSP-1.0_genome.fasta TGTTAAATATATATACA TTGCAAGTTAGTTAAGA

The tif.pl reads nucleotide sequence of rice genome saved in chr directory, position of junction will be detected by text search against the genome sequence. This enable to detect insertions even on repetitive loci.

Citing TIF

Update

  • 1.6 tif_flanking.pl is implemented 2019-03-21.
  • 1.5 tif.pl is implemented. 2019-03-19
  • 1.4 tif2.pl is improved. 2016-10-22
  • 1.3 Add new extended version tif2.pl 2015-03-02
  • 1.2 Update README.md 2014-10-09
  • 1.1 Update link of SRA-toolkit in README.md 2014-08-01
  • 1.0 Inital version 2014-02-05
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].