All Projects → wyp1125 → MCScanX

wyp1125 / MCScanX

Licence: other
MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!

Programming Languages

CAP CDS
7 projects
C++
36643 projects - #6 most used programming language
java
68154 projects - #9 most used programming language
perl
6916 projects
Raku
181 projects
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to MCScanX

Dynamic Programming
A tutorial aimed to give an understanding of common dynamic programming problems
Stars: ✭ 109 (-24.31%)
Mutual labels:  dynamic-programming
Ultimate Java Resources
Java programming. All in one Java Resource for learning. Updated every day and up to date. All Algorithms and DS along with Development in Java. Beginner to Advanced. Join the Discord link.
Stars: ✭ 143 (-0.69%)
Mutual labels:  dynamic-programming
algoexpert
AlgoExpert is an online platform that helps software engineers to prepare for coding and technical interviews.
Stars: ✭ 8 (-94.44%)
Mutual labels:  dynamic-programming
Awesome Algorithms Books
CLRS + Algorithhms by Robert Sedgewick, Kevin Wayne +Algorithm_design by Jon Kleinberg and Éva Tardos
Stars: ✭ 116 (-19.44%)
Mutual labels:  dynamic-programming
Safe learning
Safe reinforcement learning with stability guarantees
Stars: ✭ 140 (-2.78%)
Mutual labels:  dynamic-programming
Algo Tree
Algo-Tree is a collection of Algorithms and data structures which are fundamentals to efficient code and good software design. Creating and designing excellent algorithms is required for being an exemplary programmer. It contains solutions in various languages such as C++, Python and Java.
Stars: ✭ 166 (+15.28%)
Mutual labels:  dynamic-programming
Project Euler Solutions
Runnable code for solving Project Euler problems in Java, Python, Mathematica, Haskell.
Stars: ✭ 1,374 (+854.17%)
Mutual labels:  dynamic-programming
recursion-and-dynamic-programming
Julia and Python recursion algorithm, fractal geometry and dynamic programming applications including Edit Distance, Knapsack (Multiple Choice), Stock Trading, Pythagorean Tree, Koch Snowflake, Jerusalem Cross, Sierpiński Carpet, Hilbert Curve, Pascal Triangle, Prime Factorization, Palindrome, Egg Drop, Coin Change, Hanoi Tower, Cantor Set, Fibo…
Stars: ✭ 37 (-74.31%)
Mutual labels:  dynamic-programming
Algorithms
A collection of common algorithms and data structures implemented in java, c++, and python.
Stars: ✭ 142 (-1.39%)
Mutual labels:  dynamic-programming
Fucking Algorithm
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
Stars: ✭ 99,705 (+69139.58%)
Mutual labels:  dynamic-programming
Move37
Coding Demos from the School of AI's Move37 Course
Stars: ✭ 130 (-9.72%)
Mutual labels:  dynamic-programming
Dsa Geeksclasses
DSA-Self Paced With Doubt Assistance Course Solutions in Python (Python 3)
Stars: ✭ 137 (-4.86%)
Mutual labels:  dynamic-programming
Coding Ninjas Competitive
This will have all the solutions to the competitive programming course's problems by Coding ninjas. Star the repo if you like it.
Stars: ✭ 168 (+16.67%)
Mutual labels:  dynamic-programming
Pointless
Pointless: a scripting language for learning and fun
Stars: ✭ 116 (-19.44%)
Mutual labels:  dynamic-programming
StructDualDynProg.jl
Implementation of SDDP (Stochastic Dual Dynamic Programming) using the StructJuMP modeling interface
Stars: ✭ 22 (-84.72%)
Mutual labels:  dynamic-programming
Snap4arduino
Binding Snap! and Arduino together
Stars: ✭ 107 (-25.69%)
Mutual labels:  dynamic-programming
Interviewbit
Collection of Abhishek Agrawal's gists solutions for problems on https://www.interviewbit.com
Stars: ✭ 166 (+15.28%)
Mutual labels:  dynamic-programming
video-scene-detection
Video Scene Detection Based on the Optimal Sequential Grouping algorithm
Stars: ✭ 62 (-56.94%)
Mutual labels:  dynamic-programming
InterviewPrep
A repository containing link of good interview questions
Stars: ✭ 54 (-62.5%)
Mutual labels:  dynamic-programming
Rocketcocoa
A framework for running any extra Cocoa code dynamically
Stars: ✭ 187 (+29.86%)
Mutual labels:  dynamic-programming

MCScanX

License:BSD

Overview

The MCScanX package has two major components: a modified version of MCscan algorithm allowing users to handle MCScan more conveniently and to view multiple alignment of syntenic blocks more clearly, and a variety of downstream analysis tools to conduct different biological analyses based on the synteny data generated by the modified MCScan algorithm.

All programs are executed using command line options on Linux systems or Mac OS. Usage or help information are well built into the programs. To show them on the screen, users just need to run the program without giving any options:

$./program_name

MCScanX flow chart

All code is copiable, distributable, modifiable, and usable without any restrictions. Contact: Yupeng Wang, [email protected]; Xu Tan, [email protected]

Installation

Make

Simply put MCscanX.zip into a directory and run:

$unzip MCscanx.zip
$cd MCScanx
$make

The following is the list of executable programs

Main programs (in the main folder)

  • MCScanX
  • MCScanX_h
  • Duplicate_gene_classifier

Downstream analysis programs (in the downstream_analyses folder)

  • Tool 1. detect_syntenic_tandem_arrays
  • Tool 2. dissect_multiple_alignment
  • Tool 3. dot_plotter.java
  • Tool 4. dual_synteny_plotter.java
  • Tool 5. circle_plotter.java
  • Tool 6. bar_plotter.java
  • Tool 7. add_ka_and_ks_to_synteny.pl
  • Tool 8. group_collinear_genes.pl
  • Tool 9. detect_collinearity_within_gene_families.pl
  • Tool 10. family_circle_plotter.java
  • Tool 11. family_tree_plotter.java
  • Tool 12. origin_enrichment_analysis.pl

Main programs

MCScanX

This program, implementing a modified MCScan algorithm, detects syntenic blocks and progressively aligns multiple syntenic blocks against reference genomes (PIVOT).

  • Usage

MCscan2 reads in two data files: xyz.blast and xyz.gff. The xyz.blast file is simply the direct BLASTP output of m8 format as following:

AT1G50920   AT1G50920    100.00  671     0       0       1       671     1       671     0.0     1316

Here is a typical parameter setting for generating the xyz.blast file:

$blastall  -i  query_file  -d database -p blastp -e 1e-10 -b 5 -v 5 -m 8 -o xyz.blast

The xyz.bed file holds gene positions, following a tab-delimited format:

chr#    starting_position       ending_position gene

Note: for chr#, a two-letter short name is used as prefix for the species; # is the chromosome number. (For example, the second chromosome of Arabidopsis thaliana should be denoted as at2.) The bed format is defined here, and is especially useful since there are a ton of tools that can handle bed files, most notably BEDTOOLS. The xyz.bed file can be generated by parsing the .gff3 file released by the sequencing initiatives. Repeat of the same gene is not allowed in the .bed file. When comparing multiple genomes, simply concatenate all inter-/intra-species m8 blast output into xyz .blast file and concatenate all gene positions of different species into xyz.bed file.

It is advised that to make MCscanX generate more reasonable results, the number of BLASTP hits for a gene should be restricted to around top 5. When you have xyz.blast and xyz.bed ready, put them in the same folder. Then you can simply use:

$ ./MCScanx  dir/xyz
  • Output

The execution of MCScanX outputs one text file xyz.syteny, containing pairwise syteny blocks as follows:

## Alignment 0: score=9171.0 e_value=0 N=187 at1&at1 plus
  0-  0:        AT1G17240       AT1G72300       0
  0-  1:        AT1G17290       AT1G72330       0
  ...
  0-185:        AT1G22330       AT1G78260       1e-63
  0-186:        AT1G22340       AT1G78270       3e-174
##Alignment 1: score=5084.0 e_value=5.6e-251 N=106 at1&at1 plus

and one directory xyz.html , containing html files that display multiple alignment of syntenic blocks against each chromosome. The HTML files must be viewed through a web browser. In a HTML file, the first column shows the number of syntenic blocks at each gene locus, the second column shows the genes in PIVOT (reference chromosome) where tandem genes are marked in red, and the following is aligned syntenic blocks where only match genes are displayed.

  • MCScanX parameters (for advanced users)

[Usage]:

        ./MCScanX prefix_fn [options]

-k  MATCH_SCORE, final score=MATCH_SCORE+NUM_GAPS*GAP_PENALTY
    (default: 50)
-g  GAP_PENALTY, gap penalty (default: -1)
-s  MATCH_SIZE, number of genes required to call synteny
    (default: 5)
-e  E_VALUE, alignment significance (default: 1e-05)
-u  UNIT_DIST, average intergenic distance (default: 10000)
-m  MAX_GAPS, maximum gaps(one gap=UNIT_DIST) allowed (default: 20)
-a  only builds the pairwise blocks (.synteny file)
-b  patterns of syntenic blocks. 0:intra- and inter-species (default); 1:intra-species; 2:inter-species
-h  print this help page

MCScanX_h

The BLASTP input of MCScanX can be replaced by a tab-delimited file containing more reliable pairwise homologous relationships. In this case, users should use MCScanX_h instead. The executation of MCScanX_h is very similar to that of MCScanX, except that the "xyz.blast" file should be replaced by "xyz.homology" file. At the bottom of screen output, statistics on numbers / percentages of collinear homolog pairs are shown.

Duplicate_gene_classifier

Users may use this program, which incorporate the MCScanX algorithm, to classify origins of the duplicate genes of ONE genome into whole genome /segmental (match genes in syntenic blocks), tandem (continuous repeat), proximal (in nearby chromosomal region but not adjacent) or dispersed (other modes than segmental, tandem and proximal) duplications.

  • Usage:

    $ ./duplicate_gene_classifier  dir/xyz
    

The input of duplicate_gene_classifier is the same with MCscanX, except an additional option for defining the maximum distance (# of genes) between 2 proximal duplicates.

  • Output

The output is a text file in the same directory with input files named xyz.gene_type. It contains origin information for all the genes in xyz.gff file with a tab-delimited format:

Gene    gene_type(0/1/2/3/4)

Note: 0, 1, 2, 3, 4 stand for singleton, dispersed, proximal, tandem, segmental respectively. It is not reasonable to apply this program to data of multiple genomes.

Downstream analyses

1) Detect_syntenic_tandem_arrays

Tandem duplications often complicate synteny detection. To enhance the power of synteny detection, MCScan algorithms use the gene with best BLASTP hit to represent a tandem array. This program transforms match genes in syntenic blocks into tandem arrays if tandem duplications exist there.

  • Usage:

    $ ./detect_syntenic_tandem_arrays -g gff_file -b blast_file -s synteny_file -o output_file
    
  • Output

The path of output_file should be specified by the user. If any gene of a syntenic pair is located in a tandem array, the syntenic pair will be written into the output_file.

2) Dissect_multiple_alignment

This program dissects the number of syntenic blocks at each gene locus of the reference genome(s) into the number of intra-species syntenic blocks and the number of inter-species syntenic blocks.

  • Usage:

    $ ./dissect_multiple_alignment -g gff_file -s synteny_file -o output_file
    
  • Output

The path of output_file should be specified by the user. The first and second columns of output_file show the chromosomes and genes in reference genome(s). The 3rd, 4th and 5th columns show the numbers of intra-species syntenic blocks, inter-species syntenic blocks and outgroup species respectively.

3) dot_plotter

This java script generates a dot plot for all the syntenic blocks on two sets of chromosomes given by the user. Note that JDK is needed for executing Java programs.

  • Usage:

    $ java dot_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the dot.ctl file:

800     //dimension (in pixels) of x axis
800     //dimension (in pixels) of y axis
sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10        //chromosomes in x axis
os1,os2,os3,os4,os5,os6,os7,os8,os9,os10,os11,os12      //chromosomes in y axis

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each dot is a sytenic gene pair between the two sets of chromosomes. Different colors of dots, generated randomly, represent different syntenic blocks.

4) dual_synteny_plotter

This java script generates a dual synteny plot which links all the synteny blocks between two sets of chromosomes using straight lines.

  • Usage:

    $ java dual_synteny_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the column.ctl file:

200     //plot width (in pixels)
800     //plot height (in pixels)
sb1,sb2 //chromosomes in the left column
os1,os2,os3     //chromosomes in the right column

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each line links a pair of syntenic genes between the two sets of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

5) Circle_plotter

This Java scripts generates a circular plot which links all the syntenic blocks with curved lines between and within the chromosome set given by users.

  • Usage:

    $ java circle_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the circle.ctl file:

800     //plot width and height (in pixels)
sb1,sb2,os1,os2,os3     //chromosomes in the circle

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each curved line links a pair of syntenic genes between or within the given set of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

6) Bar_plotter

This Java scripts generates a bar plot displaying chromosome rearrangement between reference and target chromosome sets given by users.

  • Usage:

    $ java bar_plotter -g gff_file -s synteny_file -c control_file -o output_PNG_file
    

The input files include a gff file containing all gene positions, a synteny file generated by MCScanX, and a control file (.ctl) containing plot size and chromosome IDs. The control file can be easily made by modifying the bar.ctl file:

800     //dimension (in pixels) of x axis
800     //dimension (in pixels) of y axis
sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10        //reference chromosomes
os1,os2,os3,os4,os5,os6,os7,os8,os9,os10,os11,os12      //target chromosomes

Note that no space is allowed between adjacent chromosome IDs.

  • Output

Output is an image file (PNG format) which can be viewed with an image viewer. Each curved line links a pair of syntenic genes between or within the given set of chromosomes. Different colors of lines, generated randomly, represent different syntenic blocks.

7) add_kaks_to_synteny.pl

This program calculates the Ka & Ks value of each syntenic gene pair shown in the MCscan2 output (.synteny file). Bio-perl is needed for executing this program.

  • Usage:

    $ perl add_kaks_to_synteny.pl -i synteny_file -d cds_file -o output_file
    

The input is a xyz.syteny file generated by MCScanX and a coding sequence file of corresponding gene set in fasta format.

  • Output

Users should specify the path of output file. The output file is a modified version of xyz.syteny file with each line containing a syntenic gene pair and its ka & ks values.

8) group_collinear_genes.pl

This program groups genes through connecting collinear genes until any gene in each group has no collinear gene outside the group. This analysis can be used to construct gene families based on syntenic relationships.

  • Usage:

    $ perl group_collinear_genes.pl -i synteny_file -o output_file
    

Input includes a xyz.syteny file generated by MCScanX.

  • Output

The output file displays each group in one line in a tab-delimited format. Note, the first group (the largest size) usually contains much more genes than other groups, should be regarded as non-informative.

9) detect_collinearity_within_gene_families.pl

This program detects collinear gene pairs within gene families.

  • Usage

Input includes a xyz.syteny file generated by MCScanX and a gene family file in tab-delimited format with gene family name in the first column:

Gene_family_1   gene1   gene2   gene3   ...     genex
Gene_family_2   gene1   gene2   gene3   ...     genex
  • Output

The output file gives the syntenic pairs of the given gene families in tab-delimited format:

Gene_family_1   gene_pair1      gene_pair2      ...     gene_pairx
Gene_family_2   gene_pair1      gene_pair2

10) family_circle_plotter

This java script generates a circular plot which links all sytenic genes within a gene family with red curved lines, and places the gene family synteny into a genomic synteny background.

  • Usage:

    $ java family_circle_plotter -g gff_file -s synteny_file -c control_file -f gene_family_file -o output_jpeg_file
    

The input files include a .gff file containing all gene positions, a .synteny file generated by MCScanX, a control file (.ctl) containing the plot size and chromosome IDs and a gene family file containing only one gene family with the aforementioned format. The control file can be easily made by modifying the family.ctl file:

800     //plot width and height (in pixels)
at1, at2, at3, at4, at5 //chromosomes in the circle

Note: users can input just the chromosomes of interest into the family.ctl file. This will generate a circular plot within the given chromosomes set.

  • Output

Output is an image file which can be viewed with any image. Each red curved line links a pair of syntenic genes within the given gene family. The grey lines stand for genomic synteny background.

11) family_tree_plotter.java

This java script generates a gene family tree on which syntenic gene pairs and tandem gene groups are linked with red and blue curves respectively.

  • Usage:

    $ javac family_tree_plotter.java (compile the first time it is used)
    $ java family_tree_plotter -t tree_file -s synteny_file -o output_PNG_file (show syntenic gene pairs only)
    $ java family_tree_plotter -t tree_file -s synteny_file –d tandem_pair_file -o output_PNG_file (show both tandem and syntenic gene pairs)
    

The input files include a .synteny file generated by MCScanX and a tree file for the gene family in newick format (bracket tree).

Users can set up the plot width, plot height, and font_size with the following options: -x plot_width -y plot height -f font_size

  • Output

The output is an image file (PNG format) which can be viewed with an image viewer;

Note: this script aims to show the synteny and tandem overview for a gene family. The branch lengths are disregarded, thus do not reflect the true value.

12) origin_enrichment_analysis.pl

This program identifies potential enrichment of duplicate gene origins for input gene families according to the result of Duplicate_gene_classifier.

  • Usage:

    $ perl origin_enrichment_analysis.pl -i gene_family_file -d gene_origin_file  -o output_file
    

This perl program takes in a gene family file with the same format as the above ones and the gene origin file generated by Duplicate_gene_classifier.

  • Output

The output is the p-values of different origins for the given gene families

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].