All Projects → sanger-pathogens → companion

sanger-pathogens / companion

Licence: ISC License
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion

Programming Languages

lua
6591 projects
groovy
2714 projects
ruby
36898 projects - #4 most used programming language
Dockerfile
14818 projects
shell
77523 projects
perl
6916 projects
Nextflow
61 projects

Projects that are alternatives of or similar to companion

Genometools
GenomeTools genome analysis system.
Stars: ✭ 186 (+785.71%)
Mutual labels:  bioinformatics, annotation, genomics, genome
Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (+1142.86%)
Mutual labels:  bioinformatics, genomics, genome
Scaff10X
Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
Stars: ✭ 21 (+0%)
Mutual labels:  bioinformatics, genomics, genome
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+104.76%)
Mutual labels:  bioinformatics, pipeline, genomics
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+3766.67%)
Mutual labels:  bioinformatics, pipeline, genomics
Vcfanno
annotate a VCF with other VCFs/BEDs/tabixed files
Stars: ✭ 259 (+1133.33%)
Mutual labels:  bioinformatics, annotation, genomics
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+4671.43%)
Mutual labels:  bioinformatics, genomics, genome
GenomeAnalysisModule
Welcome to the website and github repository for the Genome Analysis Module. This website will guide the learning experience for trainees in the UBC MSc Genetic Counselling Training Program, as they embark on a journey to learn about analyzing genomes.
Stars: ✭ 19 (-9.52%)
Mutual labels:  bioinformatics, genomics, genome
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+490.48%)
Mutual labels:  bioinformatics, pipeline, genomics
Ribbon
A genome browser that shows long reads and complex variants better
Stars: ✭ 184 (+776.19%)
Mutual labels:  bioinformatics, genomics, genome
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (+161.9%)
Mutual labels:  bioinformatics, genomics, genome
Dram
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 47 (+123.81%)
Mutual labels:  bioinformatics, annotation, genomics
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+11347.62%)
Mutual labels:  bioinformatics, genomics, genome
Bedops
🔬 BEDOPS: high-performance genomic feature operations
Stars: ✭ 215 (+923.81%)
Mutual labels:  bioinformatics, pipeline, genomics
jgi-query
A simple command-line tool to download data from Joint Genome Institute databases
Stars: ✭ 38 (+80.95%)
Mutual labels:  bioinformatics, genomics
get phylomarkers
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+61.9%)
Mutual labels:  pipeline, genomics
faster lmm d
A faster lmm for GWAS. Supports GPU backend.
Stars: ✭ 12 (-42.86%)
Mutual labels:  bioinformatics, genomics
redundans
Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (+328.57%)
Mutual labels:  pipeline, genomics
wgs2ncbi
Toolkit for preparing genomes for submission to NCBI
Stars: ✭ 25 (+19.05%)
Mutual labels:  bioinformatics, genomics
unimap
A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment
Stars: ✭ 76 (+261.9%)
Mutual labels:  bioinformatics, genomics

Companion

A portable, scalable eukaryotic genome annotation pipeline implemented in Nextflow.

Build Status
License: ISC
status

This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks:

  • Fast generation of pseudomolecules from scaffolds by ordering and orientating against a reference
  • Accurate transfer of highly conserved gene models from the reference
  • De novo gene finding as a complement to the gene transfer
  • Non-coding RNA detection (tRNA, rRNA, sn(o)RNA, ...)
  • Pseudogene detection
  • Functional annotation (GO, products, ...)
    • ...by transferring reference annotations to the target genome
    • ...by inferring GO terms and products from Pfam pHMM matches
  • Consistent gene ID assignment
  • Preparation of validated GFF3, GAF and EMBL output files for jump-starting manual curation and quick turnaround time to submission

It supports parallelized execution on a single machine as well as on large cluster platforms (LSF, SGE, ...).

Contents

Quick start

This should get you up & running on an Ubuntu system, but please read the full documentation before before doing any work "for real".

1. Install dependencies

Execute these commands as root, e.g. using sudo

apt-get install default-jre
curl -fsSL get.nextflow.io | bash && \
   mv nextflow /usr/local/bin
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - && \
   add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" && \
   apt-get update && \
   apt-cache policy docker-ce && \
   apt-get install --yes docker-ce && \
   systemctl enable docker

To enable you to use docker with your normal user account (i.e. without being root or needing to use sudo), run the following command, with your username in place of <username>.

usermod -aG docker <username>

Log out and log back in again for this to take effect.

Checks

  • java -version should say you have Java 1.8 or greater
  • nextflow info will print system information if nextflow has been installed successfully
  • systemctl status docker will tell you if docker is active (running)
  • docker info will print information if docker has been installed successfully, and you have permission to use it

2. Install Companion

Execute these commands in the directory you want to keep your Companion work in. Do this as a normal user, i.e. not as root or using sudo. Use a name that is meaningful to you in place of <my-companion-project>

curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip && \
   unzip companion-master.zip && \
   mv companion <my-companion-project>
docker pull sangerpathogens/companion

3. Run Companion test job

Companion is distributed with configuration and data (including a few pregenerated reference annotations) for a small test run. Run the following command (using the name you chose for your project directory in place of my-companion-project).

nextflow run my-companion-project -profile docker

This will create a directory my-companion-project/example-output with the results of the run.

4. Configure Companion for your annotation run

The file params_default.config configures the pipeline, and will need to be edited for your annotation run. You will probably need to change at least the following parameters:

inseq Your input FASTA file (${baseDir}/example-data/L_donovani.1.fasta in the example parameter file included wirth the distribution)

ref_dir The directory containing your reference genomes (${baseDir}/example-data/references in the example file)

ref_species The "short name" for your reference species (LmjF.1 in the example file)

dist_dir The directory that will contain the newly created output files (${baseDir}/example-data-output in the example file)

GENOME_PREFIX Text pattern matching your genome prefix (LDON in the example file)

CHR_PATTERN Pattern matching your chromosome names (LDON_(%w+) in the example file, where %w+ matches one or more letters or numbers)

ABACAS_BIN_CHR Abacas bin chromosome (LDON_0 in the example file)

EMBL_AUTHORS etc.; please provide suitable EMBL metadata (dummy values in the example file)

TAXON_ID Please provide suitable value for the GAF output (4711 in the example file)

5. Prepare reference annotations

The reference annotations used in the pipeline need to be pre-processed before they can be used. To add a reference organism, you will need:

  • a descriptive name of the organism
  • a short abbreviation for the organism
  • the genome sequence in a single FASTA file
  • a structural gene annotation in GFF3 format (see below for details)
  • functional GO annotation in GAF 1.0 format, on the gene level
  • a pattern matching chromosome headers, describing how to extract chromosome numbers from them
  • an AUGUSTUS model, trained on reference genes

Insert these file names, etc., where <placeholders> appear in the steps below:

  1. Create a new data directory (i.e. the equivalent of the example-data directory included in the distribution)
  2. Edit nextflow.config (and any config files that are referenced) and change parameters such as inseq and ref_dir to your new data directory.
  3. Copy the new reference genome (FASTA) into <new_data_dir>/genomes
  4. Copy GFF3 and GAF files into <new_data_dir>/genomes
  5. Copy Augustus model files into data/augustus/species/<species_name>/
  6. Create new directory <new_data_dir>/references/<short_name>/
  7. Add new section to <new_data_dir>/references/references-in.json, using the short name (same as the directory name in the previous step); in this section add the names/paths of the files copied (above), a descriptive name, and a pattern for matching chromosomes in the FASTA files (in this example, <short_name>_, where n in any integer).
"<short_name>" : {   "gff"                : "../genomes/<gff3_filename>.gff3",
                     "genome"             : "../genomes/<ref_genome_name>.fasta",
                     "gaf"                : "../genomes/<ref_annot_filename>.gaf",
                     "name"               : "<Descriptive Name of Reference Genome>",
                     "augustus_model"     : "../../data/augustus/species/<species_name>/",
                     "chromosome_pattern" : "<short_name>_(%d+)"
                  }
  1. Finally, change directory to <new_data_dir>/references (you must execute the following command in this directory) and run ../../bin/update_references.lua. This writes the file <new_data_dir>/references/references.json.

6. Run it!

The following command (using the name you chose for your project directory in place of my-companion-project) will start your annotation run:

nextflow run my-companion-project -profile docker

Further technical information

Dependencies

Companion has the following dependencies:

Java

To check if you have Java installed, and the version, use the command java -version. Note that this will give you a version number of 1.8 for Java 8, 1.9 for Java 9, etc.

To install Java 8 on an Ubuntu or Debian system, run:

apt-get install openjdk-8-jre

On Fedora, Centos or Red Hat (etc.) systems:

yum install java-1.8.0-openjdk

Nextflow

To install Nextflow, run:

curl -fsSL get.nextflow.io | bash

This will create an executable called 'nextflow', which should be moved to a suitable directory, for example:

mv nextflow /usr/local/bin/

Use the command which nextflow to check that it is found in your path.

Docker

Docker is required if you intended to use the Docker image, as recommended below, to satisfy the dependencies.

To install Docker, see the installation guide for Ubuntu, Centos, Debian or Fedora.

Users running Companion with Docker will need to be added to the docker group (unix users can belong to one or more groups, which determine whether they can peform certain actions; adding a user to the docker group allows them to execute docker commands). To add the user with username <username>, to the docker group, run:

usermod -aG docker <username>

Some Linux systems may not have usermod installed, as there are different programs that can be used to change user settings; please consult your Linux distribution documentation if necessary.

Installation

There are a number of ways to install Companion; details for an installation using Docker are described below. If you encounter an issue when installing Companion please contact your local system administrator. If you encounter a bug please log it here or email us at [email protected].

The easiest way to use the pipeline is to use the prepared Docker image which contains all external dependencies.

docker pull sangerpathogens/companion

Usage

Local copy of Companion

To create a local copy of companion, you can download this repo from github (if you are familiar with github, you may of course prefer to clone or fork it).

curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip  # or click the green button on the guthub web page
unzip companion-master.zip
mv companion-master my-companion-project # renaming it to something meaningful to you is a good idea

Now you can run Companion. There is an example dataset and parameterization included in the distribution, so to get started just run:

nextflow run my-companion-project -profile docker

The argument -profile docker instructs nextflow to run the sangerpathogens/companion docker image for the dependencies; the nextflow.config file (and files referenced within it) define the docker profile and the docker image to be used.

Running Companion direct from a repository

If you run nextflow with the name of a github repository, it will pull (download) the contents of the repository and run with those. For example, the following command will do the same as the "local copy" example above:

nextflow run sanger-pathogens/companion -profile docker

It is best to use this with some caution. After the command above is run, nextflow will have stored a local copy of the repository in .nextflow/assets/sanger-pathogens (note that .nextflow is a hidden directory, and will not usually be visible; use the command ls -la .nextflow to see it).

If you run the same command again it will this time use the local copy instead of pulling a copy from the repository. You can edit the files in your local copy, and nextflow will work from your (now different) version of sanger-pathogens/companion.

If you are familiar with repositories, and the workflow appropriate to using them, this can be a very convenient way of working. You can create your own github repository to store and share your work, and track versions.

If you are not familiar with git repositories, it can become quite confusing, and you should probably work with a simple local copy.

Preparing reference annotations

Further documentation on preparing reference data can be found in the GitHub wiki.

License

Companion is free software, licensed under ISC.

Feedback/Issues

Please report any issues to the issues page

Citation

If you use this software please cite: Companion: a web server for annotation and analysis of parasite genomes. Steinbiss S, Silva-Franco F, Brunk B, Foth B, Hertz-Fowler C et al. Nucleic Acids Research, 44:W29-W34, 2016.
DOI: 10.1093/nar/gkw292

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].