All Projects → oushujun → LTR_retriever

oushujun / LTR_retriever

Licence: GPL-3.0 license
LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.

Programming Languages

perl
6916 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to LTR retriever

LTR FINDER parallel
A parallel wrapper for LTR_FINDER
Stars: ✭ 29 (-77.86%)
Mutual labels:  genome-annotation, ltr-retrotransposons, ltr-retriever
LRSDAY
LRSDAY: Long-read Sequencing Data Analysis for Yeasts
Stars: ✭ 26 (-80.15%)
Mutual labels:  genome-annotation, genome-assembly
EarlGrey
Earl Grey: A fully automated TE curation and annotation pipeline
Stars: ✭ 25 (-80.92%)
Mutual labels:  genome-annotation
HINGE
Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
Stars: ✭ 63 (-51.91%)
Mutual labels:  genome-assembly
wgs2ncbi
Toolkit for preparing genomes for submission to NCBI
Stars: ✭ 25 (-80.92%)
Mutual labels:  genome-annotation
berokka
🍊 💫 Trim, circularise and orient long read bacterial genome assemblies
Stars: ✭ 23 (-82.44%)
Mutual labels:  genome-assembly
instaGRAAL
Large genome reassembly based on Hi-C data, continuation of GRAAL
Stars: ✭ 32 (-75.57%)
Mutual labels:  genome-assembly
GCModeller
GCModeller: genomics CAD(Computer Assistant Design) Modeller system in .NET language
Stars: ✭ 25 (-80.92%)
Mutual labels:  genome-annotation
pipeline-pinfish-analysis
Pipeline for annotating genomes using long read transcriptomics data with pinfish
Stars: ✭ 27 (-79.39%)
Mutual labels:  genome-annotation
downpore
Suite of tools for use in genome assembly and consensus. Work in progress.
Stars: ✭ 32 (-75.57%)
Mutual labels:  genome-assembly
LTRpred
De novo annotation of young retrotransposons
Stars: ✭ 35 (-73.28%)
Mutual labels:  ltr-retrotransposons
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (-83.21%)
Mutual labels:  genome-assembly
haslr
A fast tool for hybrid genome assembly of long and short reads
Stars: ✭ 68 (-48.09%)
Mutual labels:  genome-assembly
varsome-api-client-python
Example client programs for Saphetor's VarSome annotation API
Stars: ✭ 21 (-83.97%)
Mutual labels:  genome-annotation
EDTA
Extensive de-novo TE Annotator
Stars: ✭ 210 (+60.31%)
Mutual labels:  genome-annotation
TOGA
TOGA (Tool to infer Orthologs from Genome Alignments): implements a novel paradigm to infer orthologous genes. TOGA integrates gene annotation, inferring orthologs and classifying genes as intact or lost.
Stars: ✭ 35 (-73.28%)
Mutual labels:  genome-annotation
SIAC GEE
SIAC GEE version
Stars: ✭ 41 (-68.7%)
Mutual labels:  lai
dentist
Close assembly gaps using long-reads at high accuracy.
Stars: ✭ 39 (-70.23%)
Mutual labels:  genome-assembly
indelope
find large indels (in the blind spot between GATK/freebayes and SV callers)
Stars: ✭ 38 (-70.99%)
Mutual labels:  genome-assembly
redundans
Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (-31.3%)
Mutual labels:  genome-assembly

install with bioconda Anaconda-Server Badge

Table of Contents

Introduction

LTR_retriever is a command line program (in Perl) for accurate identification of LTR retrotransposons (LTR-RTs) from outputs of LTRharvest, LTR_FINDER, MGEScan 3.0.0, LTR_STRUC, and LtrDetector, and generates non-redundant LTR-RT library for genome annotations.

By default, the program will generate whole-genome LTR-RT annotation and the LTR Assembly Index (LAI) for evaluations of the assembly continuity of the input genome. Users can also run LAI separately (see Usage).

Installation

LTR_retriever is installation-free but requires dependencies: TRF, BLAST+, BLAST or CD-HIT, HMMER, and RepeatMasker. You may specify the path to these programs in the command line (run LTR_retriever -h for details) or install them in the following ways:

Quick installation using conda

 conda install -c bioconda ltr_retriever  

Step by step using conda

You may use conda to quickly install all dependencies and LTR_retriever is then good to go:

conda create -n LTR_retriever
conda activate LTR_retriever
conda install -y -c conda-forge perl perl-text-soundex
conda install -y -c bioconda cd-hit repeatmasker
git clone https://github.com/oushujun/LTR_retriever.git
./LTR_retriever/LTR_retriever -h

Standard installation

You can also provide the fixed paths to the following dependent programs.

  1. makeblastdb, blastn, and blastx in the BLAST+ package,
  2. cd-hit-est in the CDHIT package OR blastclust in the BLAST package,
  3. hmmsearch in the HMMER package (v3.1b2 or higher), and
  4. RepeatMasker.

Simply modify the 'paths' file in the LTR_retriever directory

vi /your_path_to/LTR_retriever/paths

Inputs

Two types of inputs are required for LTR_retriever

  1. Genomic sequence
  2. LTR-RT candidates

LTR_retriever takes multiple LTR-RT candidate inputs including the screen output of LTRharvest and the screen output of LTR_FINDER. For outputs of other LTR identification programs, you may convert them to LTRharvest-like format and feed them to LTR_retriever (with -inharvest). Users need to obtain the input file(s) from the aforementioned programs before running LTR_retriever. Either a single input source or a combination of multiple inputs are acceptable. For more details and examples please see the manual.

It's sufficient and recommended to use LTRharvest and LTR_FINDER results for LTR_retriever. However, if you want to analyze results from LTR_STRUC, MGEScan 3.0.0, and LtrDetector, you can use the following scripts to convert their outputs to the LTRharvest format, then feed LTR_retriever with -inharvest. You may concatenate multiple LTRharvest format inputs into one file. For instructions, run:

perl /your_path_to/LTR_retriever/bin/convert_ltr_struc.pl
perl /your_path_to/LTR_retriever/bin/convert_MGEScan3.0.pl
perl /your_path_to/LTR_retriever/bin/convert_ltrdetector.pl

Click to download executables for LTR_FINDER_parallel and LTRharvest.

Outputs

The output of LTR_retriever includes:

  1. Intact LTR-RTs with coordinate and structural information
    • Summary tables (.pass.list)
    • GFF3 format output (.pass.list.gff3)
  2. LTR-RT library
    • All non-redundant LTR-RTs (.LTRlib.fa)
    • All non-TGCA LTR-RTs (.nmtf.LTRlib.fa)
    • All LTR-RTs with redundancy (.LTRlib.redundant.fa)
  3. Whole-genome LTR-RT annotation by the non-redundant library
    • GFF format output (.out.gff)
    • LTR family summary (.out.fam.size.list)
    • LTR superfamily summary (.out.superfam.size.list)
    • LTR distribution on each chromosome (.out.LTR.distribution.txt)
  4. LTR Assembly Index (.out.LAI)

Usage

Best practice: It's highly recommended to use short and simple sequence names. For example, use letters, numbers, and _ to generate unique names shorter than 15 bits. If there are long sequence names, LTR_retriever will try to convert it for you, but not always successful.

To obtain raw input files with LTRharvest and LTR_FINDER_parallel:

/your_path_to/gt suffixerator -db genome.fa -indexname genome.fa -tis -suf -lcp -des -ssp -sds -dna
/your_path_to/gt ltrharvest -index genome.fa -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > genome.fa.harvest.scn
/your_path_to/LTR_FINDER_parallel -seq genome.fa -threads 10 -harvest_out -size 1000000 -time 300
cat genome.fa.harvest.scn genome.fa.finder.combine.scn > genome.fa.rawLTR.scn

To run LTR_retriever:

/your_path_to/LTR_retriever -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10 [options]

To run LAI:

/your_path_to/LAI -genome genome.fa -intact genome.fa.pass.list -all genome.fa.out [options]

For more details about the usage and parameter settings, please see the help pages by running:

/your_path_to/LTR_retriever -h

/your_path_to/LAI -h

Or refer to the manual document.

For questions and Issues please see: https://github.com/oushujun/LTR_retriever/issues

Citations

If you find LTR_retriever useful, please cite:

Ou S. and Jiang N. (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422. open access

If you find LAI useful, please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730. open access

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].