All Projects → menghaowei → ngstools

menghaowei / ngstools

Licence: MIT License
My own tools code for NGS data analysis (Next Generation Sequencing)

Programming Languages

python
139335 projects - #7 most used programming language
r
7636 projects

Projects that are alternatives of or similar to ngstools

PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-39.29%)
Mutual labels:  bioinformatics, next-generation-sequencing
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-3.57%)
Mutual labels:  bioinformatics, next-generation-sequencing
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-3.57%)
Mutual labels:  bioinformatics, next-generation-sequencing
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-39.29%)
Mutual labels:  bioinformatics, next-generation-sequencing
biskit
A Python platform for Structural Bioinformatics
Stars: ✭ 47 (+67.86%)
Mutual labels:  bioinformatics
awesome-small-molecule-ml
A curated list of resources for machine learning for small-molecule drug discovery
Stars: ✭ 54 (+92.86%)
Mutual labels:  bioinformatics
CAFE5
Version 5 of the CAFE phylogenetics software
Stars: ✭ 53 (+89.29%)
Mutual labels:  bioinformatics
gene-oracle
Feature extraction algorithm for genomic data
Stars: ✭ 13 (-53.57%)
Mutual labels:  bioinformatics
full spectrum bioinformatics
An open-access bioinformatics text
Stars: ✭ 26 (-7.14%)
Mutual labels:  bioinformatics
dna-sculpture
3D printed sculpture of a DNA molecule, showing my own genome
Stars: ✭ 22 (-21.43%)
Mutual labels:  bioinformatics
mulled
Mulled - Automatized Containerized Software Repository
Stars: ✭ 49 (+75%)
Mutual labels:  bioinformatics
CENTIPEDE.tutorial
🐛 How to use CENTIPEDE to determine if a transcription factor is bound.
Stars: ✭ 23 (-17.86%)
Mutual labels:  bioinformatics
CATT
An ultra-sensitive and precise tool for characterizing T cell CDR3 sequences in TCR-seq and RNA-seq data.
Stars: ✭ 17 (-39.29%)
Mutual labels:  bioinformatics
hotmap
WebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-53.57%)
Mutual labels:  bioinformatics
adversarial-relation-classification
Unsupervised domain adaptation method for relation extraction
Stars: ✭ 18 (-35.71%)
Mutual labels:  bioinformatics
CoNekT
CoNekT (short for Co-expression Network Toolkit) is a platform to browse co-expression data and enable cross-species comparisons.
Stars: ✭ 17 (-39.29%)
Mutual labels:  bioinformatics
staramr
Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
Stars: ✭ 52 (+85.71%)
Mutual labels:  bioinformatics
orfipy
Fast and flexible ORF finder
Stars: ✭ 27 (-3.57%)
Mutual labels:  bioinformatics
companion
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: ✭ 21 (-25%)
Mutual labels:  bioinformatics
calN50
Compute N50/NG50 and auN/auNG
Stars: ✭ 20 (-28.57%)
Mutual labels:  bioinformatics

logo

ngstools

My own tools code for NGS data analysis (Next Generation Sequencing)

Now we have 2 R functions and 1 Python tool:

Python tools:
	parse-mpileup.py

R functions:
	plot.TAD
	plot.matrix

How to cite?

When you use code from this repository, please cite like:

MENG Haowei, https://github.com/menghaowei/ngstools

R Function: plot.matrix()

please download ngs_R_function.R, ./test_data/hic-chr_1_100000.matrix.txt" and you can use those R functions directly with source command in R envirnment like:

# load R funtions
source(file = "ngs_R_function.R")

# load test matrix
# This test matrix is from a real Hi-C data set
test_mat = read.table(file = "./test_data/hic-chr_1_100000.matrix.txt",header = F,sep = ",")

1.plot matrix

plot.matrix(test_mat)
title(main="plot heatmap from Hi-C data")

plot.matrix.001

2.plot matrix and filter too large signal (95% quantile number as cutoff)

plot.matrix(test_mat,bound.max = 0.95)
title(main="filter too large signal")

plot.matrix.002

3.set color range from blue to blue

plot.matrix(test_mat,bound.max = 0.95,col.min = "blue",col.max = "blue")
title(main="set heatmap as blue")

plot.matrix.003

4.set color range from blue to red (the bins number less 10 are set as blue)

plot.matrix(test_mat,bound.max = 0.95,col.min = "blue",col.max = "red",col.boundary = 10)
title(main="set color range from blue to red")

plot.matrix.004

5.the function also support RGB color, #000000 means black and #FFFFFF means white

plot.matrix(test_mat,bound.max = 0.95,col.min = "#000000",col.max = "#FFFFFF",col.boundary = 10)
title(main="support rgb color format")

plot.matrix.005


R Function: plot.TAD()

please download ngs_R_function.R, ./test_data/hic-chr_1_100000.matrix.txt" and you can use those R functions directly with source command in R envirnment like:

# load R funtions
source(file = "ngs_R_function.R")

# load test matrix
# This test matrix is from a real Hi-C data set
test_mat = read.table(file = "./test_data/hic-chr_1_100000.matrix.txt",header = F,sep = ",")

1.plot upper triangle of matrix (useful for Hi-C TAD plot)

plot.TAD(test_mat,maxBound = 0.95)
title(main="plot TAD with the same Hi-C matrix")

plot.TAD.001

2.plot lower triangle of matrix

plot.TAD(test_mat,maxBound = 0.95,mat.upper = F)
title(main="plot lower triangle of matrix")

plot.TAD.002

3.plot all matrix

plot.TAD(test_mat,maxBound = 0.95,mat.part = F)
title(main="plot whole part of matrix")

plot.TAD.003

4.set color range from blue to red (the bins number less 10 are set as blue)

plot.TAD(test_mat,maxBound = 0.95,col.min = "blue",col.max = "red",col.boundary = 5)
title(main="set color range from blue to red")

plot.TAD.004


parse-mpileup.py

parse samtools mpileup command output, also known as .pileup file, the file like:

chr1	10030	c	1	^6.	A
chr1	10031	t	1	.	A
chr1	10032	a	1	.	F
chr1	10033	a	1	.	F
chr1	10034	c	1	.	<
chr1	10035	c	1	.	F
chr1	10036	c	0	*	*
chr1	10037	t	1	.-1A	A
chr1	10038	a	0	*	*
chr1	10039	a	0	*	*

The .pileup format explain please check the HTML pileup explain

And convert .pileup file into .bmatformat, the format like:

chr_name	chr_index	ref_base	A	G	C	T	del_count	insert_count	ambiguous_count	deletioninsertion	ambiguous	mut_num
chr1	10030	C	0	0	1	0	0	0	0	.	.	.	0
chr1	10031	T	0	0	0	1	0	0	0	.	.	.	0
chr1	10032	A	1	0	0	0	0	0	0	.	.	.	0
chr1	10033	A	1	0	0	0	0	0	0	.	.	.	0
chr1	10034	C	0	0	1	0	0	0	0	.	.	.	0
chr1	10035	C	0	0	1	0	0	0	0	.	.	.	0
chr1	10036	C	0	0	0	0	1	0	0	*	.	.	0
chr1	10037	T	0	0	0	1	1	0	0	A	.	.	0
chr1	10038	A	0	0	0	0	1	0	0	*	.	.	0

For help info, please run python parse-mpileup.py -h:

python parse-mpileup.py  -h
usage: parse-mpileup.py [-h] -i INPUT [-o OUTPUT] [-p THREADS] [-n MUTNUM]
                        [--TempDir TEMPDIR]

convert mpileup file to info file

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --Input INPUT
                        samtools mpileup format file
  -o OUTPUT, --Output OUTPUT
                        Output parsed file
  -p THREADS, --Threads THREADS
                        Multiple threads number, default=1
  -n MUTNUM, --MutNum MUTNUM
                        Only contain mutation info go to the output, set 0
                        mean output all site, default=0
  --TempDir TEMPDIR     Where to keep temp files, default is the same dir with
                        --Input
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].