All Projects → nellore → rail

nellore / rail

Licence: other
Scalable RNA-seq analysis

Programming Languages

python
139335 projects - #7 most used programming language
Mathematica
289 projects
java
68154 projects - #9 most used programming language
shell
77523 projects
r
7636 projects

Projects that are alternatives of or similar to rail

learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+97.3%)
Mutual labels:  emr, mapreduce
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-75.68%)
Mutual labels:  emr, mapreduce
React-Jupyter-Viewer
A react component to embed .ipyb notebooks in a blog or something
Stars: ✭ 50 (-32.43%)
Mutual labels:  ipython
interview-refresh-java-bigdata
a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
Stars: ✭ 25 (-66.22%)
Mutual labels:  mapreduce
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-72.97%)
Mutual labels:  mapreduce
rna-seq-snakemake
Snakemake based pipeline for RNA-Seq analysis
Stars: ✭ 29 (-60.81%)
Mutual labels:  rna-seq-analysis
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-63.51%)
Mutual labels:  mapreduce
ORNA
Fast in-silico normalization algorithm for NGS data
Stars: ✭ 21 (-71.62%)
Mutual labels:  rna-seq-analysis
xdbg
Interactive live coding in Python
Stars: ✭ 25 (-66.22%)
Mutual labels:  ipython
nbmerge
A tool to merge / concatenate Jupyter (IPython) notebooks
Stars: ✭ 75 (+1.35%)
Mutual labels:  ipython
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-50%)
Mutual labels:  mapreduce
scGEAToolbox
scGEAToolbox: Matlab toolbox for single-cell gene expression analyses
Stars: ✭ 15 (-79.73%)
Mutual labels:  rna-seq-analysis
etran
Erlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments
Stars: ✭ 19 (-74.32%)
Mutual labels:  mapreduce
mit-6.824-distributed-systems
Template repository to work on the labs from MIT 6.824 Distributed Systems course.
Stars: ✭ 48 (-35.14%)
Mutual labels:  mapreduce
app
Aplicación web para ANDES
Stars: ✭ 12 (-83.78%)
Mutual labels:  emr
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-47.3%)
Mutual labels:  mapreduce
hilda
LLDB wrapped and empowered by iPython's features
Stars: ✭ 99 (+33.78%)
Mutual labels:  ipython
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-2.7%)
Mutual labels:  mapreduce
spark-notebook-examples
Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin
Stars: ✭ 49 (-33.78%)
Mutual labels:  ipython
curly-html-elements
Mini tool for arranging HTML elements along with ellipse subset shapes
Stars: ✭ 14 (-81.08%)
Mutual labels:  alignments

Rail-RNA logo

is software for RNA-seq analysis. It was used to generate outputs for the recount2 project and is deprecated as of January 2021. Please use monorail instead to generate RNA-seq outputs that can be compared with recount3, the successor to recount2.

Build Status

Visit

the website.

Download

the latest stable release. Read the

docs,

especially the

tutorial.

Ask questions in the repo's

Join the chat at https://gitter.im/nellore/rail .

Get interested

Rail-RNA's distinguishing features are

  • Scalability. Built on MapReduce, the software scales to analyze hundreds of RNA-seq samples at the same time.
  • Reduced redundancy. The software identifies and eliminates redundant alignment work, making the end-to-end analysis time per sample decrease for fixed computer cluster size as the number of samples increases.
  • Integrative analysis. The software borrows strength across replicates to achieve more accurate splice junction detection, especially in genomic regions with low coverage.
  • Mode agnosticism. The software integrates its own parallel abstraction layer that allows it to be run in various distributed computing environments, including the Amazon Web Services (AWS) Elastic MapReduce (EMR) service, or any distributed environment supported by IPython, including clusters using batch schedulers like PBS or SGE, Message Passing Interface (MPI), or any cluster with a shared filesystem and mutual SSH access. Alternately, Rail-RNA can be run on a single multi-core computer, without the aid of a batch system or MapReduce implementation.
  • Inexpensive cloud implementation. An EMR run on > ~100 samples costs ~ $1/sample with spot instances.
  • Secure analysis of dbGaP-protected data on EMR. See this guide for information on setup.

Outputs currently include

  • Alignment BAMs with only primary alignments by default (for more, use --bowtie2-args "-k <N>", where <N> is the maximum number of alignments to report per read)
  • Genome coverage bigWigs
  • TopHat-like indel and splice junction BEDs

and will likely expand in future versions.

Read our paper for more details. Methods explained there correspond to Rail-RNA 0.1.9.

Get set up

Start with a recent (>= 2009) OS X or Linux box. For a no-fuss install, enter

(INSTALLER=/var/tmp/$(cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9' | head -c 32);
curl http://verve.webfactional.com/rail -o $INSTALLER; python2 $INSTALLER -m || true;
rm -f $INSTALLER)

at a Bash prompt. For a more customizable install, download install_rail-rna-0.2.4b, change to the directory containing it, and make the installer executable with

chmod +x install_rail-rna-0.2.4b

Now run

sudo ./install_rail-rna-0.2.4b

to install for all users or

./install_rail-rna-0.2.4b

to install for just you. Refer to these detailed installation instructions from the docs for more information. If the executable doesn't work, you may need Python. You'll also need Bowtie 1 and 2 indexes of the appropriate genome assembly if you will be running Rail-RNA in either its single-computer (local) or IPython Parallel (parallel) modes. The easiest way to get these is by downloading an Illumina iGenome. If running Rail-RNA on EMR (elastic mode) and aligning to hg19, the assembly can be specified at the command line with the -a parameter.

Get started

Rail-RNA takes as input a Myrna-style manifest file, which describes a set of input FASTQs that may be on the local filesystem in local and parallel modes; or on the web or Amazon Simple Storage Service (S3) in local, parallel, and elastic modes. Each line takes one of the following two forms.

  1. (for a set of unpaired input reads) <FASTQ URL>(tab)<optional MD5>(tab)<sample label>
  2. (for a set of paired input reads) <FASTQ URL 1>(tab)<optional MD5 1>(tab)<FASTQ URL 2>(tab)<optional MD5 2>(tab)<sample label>

Find some RNA-seq data, create a manifest file, run

rail-rna

and follow the instructions; or check the docs for help getting started.

To use Rail-RNA in elastic mode, you'll need an account with AWS. For an introduction to cloud computing with AWS, refer to this excellent tutorial by the Griffith Lab at Wash U.

Disclaimer

Renting AWS resources costs money, regardless of whether your run ultimately succeeds or fails. In some cases, Rail-RNA or its documentation may be partially to blame for a failed run. While we are happy to review bug reports, we do not accept responsibility for financial damage caused by these errors. Rail-RNA is provided "as is" with no warranty.

Licenses

MIT except for the directory src/hadoop/relevant-elephant, which contains Apache-licensed code adapted from Twitter's Elephant Bird project.

Contributors

This product was developed primarily at

Hopkins logo

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].