All Projects → al2na → Methylkit

al2na / Methylkit

R package for DNA methylation analysis

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Methylkit

wink-statistics
Fast & numerically stable statistical analysis
Stars: ✭ 36 (-68.97%)
Mutual labels:  statistical-analysis
mitre
The Microbiome Interpretable Temporal Rule Engine
Stars: ✭ 37 (-68.1%)
Mutual labels:  statistical-analysis
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+613.79%)
Mutual labels:  statistical-analysis
AppliedStats
A repo with homeworks and labs from a course on applied stats taken by me during my bachelor's degree in MIPT, Ru. Course authors: Andrii Hraboviy, @andriygav and Oleg Bakhteev, @bahleg.
Stars: ✭ 16 (-86.21%)
Mutual labels:  statistical-analysis
treecut
Find nodes in hierarchical clustering that are statistically significant
Stars: ✭ 26 (-77.59%)
Mutual labels:  statistical-analysis
Atsd Use Cases
Axibase Time Series Database: Usage Examples and Research Articles
Stars: ✭ 335 (+188.79%)
Mutual labels:  statistical-analysis
LFQ-Analyst
The repo for LFQ-Analyst
Stars: ✭ 17 (-85.34%)
Mutual labels:  statistical-analysis
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+827.59%)
Mutual labels:  statistical-analysis
Jupyter-Notebooks-Statistic-Walk-Throughs-Using-R
Jupyter notebooks with examples of statistical methods and analyses using R.
Stars: ✭ 21 (-81.9%)
Mutual labels:  statistical-analysis
Pymc3
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara
Stars: ✭ 6,214 (+5256.9%)
Mutual labels:  statistical-analysis
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (-80.17%)
Mutual labels:  statistical-analysis
kitsu-season-trends
🦊 Kitsu seasonal anime trends
Stars: ✭ 13 (-88.79%)
Mutual labels:  statistical-analysis
Git Quick Stats
▁▅▆▃▅ Git quick statistics is a simple and efficient way to access various statistics in git repository.
Stars: ✭ 5,139 (+4330.17%)
Mutual labels:  statistical-analysis
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (-55.17%)
Mutual labels:  statistical-analysis
Datadoubleconfirm
Simple datasets and notebooks for data visualization, statistical analysis and modelling - with write-ups here: http://projectosyo.wix.com/datadoubleconfirm.
Stars: ✭ 24 (-79.31%)
Mutual labels:  statistical-analysis
continuous Bernoulli
There are C language computer programs about the simulator, transformation, and test statistic of continuous Bernoulli distribution. More than that, the book contains continuous Binomial distribution and continuous Trinomial distribution.
Stars: ✭ 22 (-81.03%)
Mutual labels:  statistical-analysis
Expan
Open-source Python library for statistical analysis of randomised control trials (A/B tests)
Stars: ✭ 275 (+137.07%)
Mutual labels:  statistical-analysis
Ggstatsplot
Enhancing `ggplot2` plots with statistical analysis 📊🎨📣
Stars: ✭ 1,121 (+866.38%)
Mutual labels:  statistical-analysis
Uc Davis Cs Exams Analysis
📈 Regression and Classification with UC Davis student quiz data and exam data
Stars: ✭ 33 (-71.55%)
Mutual labels:  statistical-analysis
Python For Probability Statistics And Machine Learning
Jupyter Notebooks for Springer book "Python for Probability, Statistics, and Machine Learning"
Stars: ✭ 481 (+314.66%)
Mutual labels:  statistical-analysis
methylKit Logo

methylKit

Build Status Build Status GitHub release codecov

Introduction

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods such as Agilent SureSelect methyl-seq. In addition, methylKit can deal with base-pair resolution data for 5hmC obtained from Tab-seq or oxBS-seq. It can also handle whole-genome bisulfite sequencing data if proper input format is provided.

Current Features

  • Coverage statistics
  • Methylation statistics
  • Sample correlation and clustering
  • Differential methylation analysis
  • Feature annotation and accessor/coercion functions
  • Multiple visualization options
  • Regional and tiling windows analysis
  • (Almost) proper documentation
  • Reading methylation calls directly from Bismark(Bowtie/Bowtie2 alignment files
  • Batch effect control
  • Multithreading support (for faster differential methylation calculations)
  • Coercion to objects from Bioconductor package GenomicRanges
  • Reading methylation percentage data from generic text files

Staying up-to-date

You can subscribe to our googlegroups page to get the latest information about new releases and features (low-frequency, only updates are posted)

To ask questions please use methylKit_discussion forum

You can also check out the blogposts we make on using methylKit


Installation

in R console,

library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocManager::repositories(),
  dependencies=TRUE)

if this doesn't work, you might need to add type="source" argument.

Install the development version

library(devtools)
install_github("al2na/methylKit", build_vignettes=FALSE, 
  repos=BiocManager::repositories(),ref="development",
  dependencies=TRUE)

if this doesn't work, you might need to add type="source" argument.


How to Use

Typically, bisulfite converted reads are aligned to the genome and % methylation value per base is calculated by processing alignments. methylKit takes that % methylation value per base information as input. Such input file may be obtained from AMP pipeline for aligning RRBS reads. A typical input file looks like this:

chrBase	chr	base	strand	coverage	freqC	freqT
chr21.9764539	chr21	9764539	R	12	25.00	75.00
chr21.9764513	chr21	9764513	R	12	0.00	100.00
chr21.9820622	chr21	9820622	F	13	0.00	100.00
chr21.9837545	chr21	9837545	F	11	0.00	100.00
chr21.9849022	chr21	9849022	F	124	72.58	27.42
chr21.9853326	chr21	9853326	F	17	70.59	29.41

methylKit reads in those files and performs basic statistical analysis and annotation for differentially methylated regions/bases. Also a tab separated text file with a generic format can be read in, such as methylation ratio files from BSMAP, see here for an example. Alternatively, read.bismark function can read SAM file(s) output by Bismark(using bowtie/bowtie2) aligner (the SAM file must be sorted based on chromosome and read start). The sorting must be done by unix sort or samtools, sorting using other tools may change the column order of the SAM file and that will cause an error.

Below, there are several options showing how to do basic analysis with methylKit.

Documentation

  • You can look at the vignette here. This is the primary source of documentation. It includes detailed examples.
  • You can check out the slides for a tutorial at EpiWorkshop 2013. This works with older versions of methylKit, you may need to update the function names.
  • You can check out the tutorial prepared for EpiWorkshop 2012. This works with older versions of methylKit, you may need to update the function names.
  • You can check out the slides prepared for EuroBioc 2018. This also includes more recent features of methylKit and is meant to give you a quick overview about what you can do with the package.

Downloading Annotation Files

Annotation files in BED format are needed for annotating your differentially methylated regions. You can download annotation files from UCSC table browser for your genome of interest. Go to [http://genome.ucsc.edu/cgi-bin/hgGateway]. On the top menu click on "tools" then "table browser". Select your "genome" of interest and "assembly" of interest from the drop down menus. Make sure you select the correct genome and assembly. Selecting wrong genome and/or assembly will return unintelligible results in downstream analysis.

From here on you can either download gene annotation or CpG island annotation.

  1. For gene annotation, select "Genes and Gene prediction tracks" from the "group" drop-down menu. Following that, select "Refseq Genes" from the "track" drop-down menu. Select "BED- browser extensible data" for the "output format". Click "get output" and on the following page click "get BED" without changing any options. save the output as a text file.
  2. For CpG island annotation, select "Regulation" from the "group" drop-down menu. Following that, select "CpG islands" from the "track" drop-down menu. Select "BED- browser extensible data" for the "output format". Click "get output" and on the following page click "get BED" without changing any options. save the output as a text file.

In addition, you can check this tutorial to learn how to download any track from UCSC in BED format (http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)


R script for Genome Biology publication

The most recent version of the R script in the Genome Biology manuscript is here.


Citing methylKit

If you used methylKit please cite:

If you used flat-file objects or over-dispersion corrected tests please consider citing:

and also consider citing the following publication as a use-case with specific cutoffs:


Contact & Questions

e-mail to [email protected] or post a question using the web interface.

if you are going to submit bug reports or ask questions, please send sessionInfo() output from R console as well.

Questions are very welcome, although we suggest you read the paper, documentation(function help pages and the vignette) and blog entries first. The answer to your question might be there already.


Contribute to the development

See the trello board for methylKit development. You can contribute to the methylKit development via github ([http://github.com/al2na/methylKit/]) by opening an issue and discussing what you want to contribute, we will guide you from there. In addition, you should:

  • Bump up the version in the DESCRIPTION file on the 3rd number. For example, the master branch has the version numbering as in "X.Y.1". If you make a change to master branch you should bump up the version in the DESCRIPTION file to "X.Y.2".

  • Add your changes to the NEWS file as well under the correct version and appropriate section. Attribute the changes to yourself, such as "Contributed by X"

License

Artistic License/GPL

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].