All Projects → const-ae → tidygenomics

const-ae / tidygenomics

Licence: other
Tidy Verbs for Dealing with Genomic Data Frames https://const-ae.github.io/tidygenomics/

Programming Languages

r
7636 projects
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to tidygenomics

Mixomics
Development repository for the Bioconductor package 'mixOmics '
Stars: ✭ 58 (-40.21%)
Mutual labels:  genomics, r-package
autumn
autumn: Fast, Modern, and Tidy-Friendly Iterative Raking in R.
Stars: ✭ 26 (-73.2%)
Mutual labels:  r-package, r-stats
graphsim
R package: Simulate Expression data from igraph network using mvtnorm (CRAN; JOSS)
Stars: ✭ 16 (-83.51%)
Mutual labels:  genomics, r-stats
Biomartr
Genomic Data Retrieval with R
Stars: ✭ 144 (+48.45%)
Mutual labels:  genomics, r-package
fq
Command line utility for manipulating Illumina-generated FastQ files.
Stars: ✭ 31 (-68.04%)
Mutual labels:  genomics
insurancerating
R-package for actuarial pricing
Stars: ✭ 40 (-58.76%)
Mutual labels:  r-package
vioplot
Development version of vioplot R package (CRAN maintainer)
Stars: ✭ 25 (-74.23%)
Mutual labels:  r-package
rcheatsheet
⛔ ARCHIVED ⛔ A package to create cheatsheets
Stars: ✭ 33 (-65.98%)
Mutual labels:  r-package
TADLib
A Library to Explore Chromatin Interaction Patterns for Topologically Associating Domains
Stars: ✭ 23 (-76.29%)
Mutual labels:  genomics
Useless R functions
Useless R Functions. That's it
Stars: ✭ 77 (-20.62%)
Mutual labels:  r-stats
bold
Interface to the Bold Systems barcode webservice
Stars: ✭ 14 (-85.57%)
Mutual labels:  r-package
nomisr
Access UK official statistics from the Nomis database through R.
Stars: ✭ 30 (-69.07%)
Mutual labels:  r-package
dirdf
R package: dirdf - Extracts Metadata from Directory and File Names
Stars: ✭ 57 (-41.24%)
Mutual labels:  r-package
diffpriv
Easy differential privacy in R
Stars: ✭ 59 (-39.18%)
Mutual labels:  r-package
BALSAMIC
Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
Stars: ✭ 29 (-70.1%)
Mutual labels:  genomics
stellar
Search your github stars in R
Stars: ✭ 24 (-75.26%)
Mutual labels:  r-package
tidysq
tidy processing of biological sequences in R
Stars: ✭ 29 (-70.1%)
Mutual labels:  tidy
ghrecipes
⛔ ARCHIVED ⛔ Provides some helper functions for using the GitHub V4 API
Stars: ✭ 28 (-71.13%)
Mutual labels:  r-package
SpatPCA
R Package: Regularized Principal Component Analysis for Spatial Data
Stars: ✭ 16 (-83.51%)
Mutual labels:  r-package
dee2
Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
Stars: ✭ 32 (-67.01%)
Mutual labels:  genomics

tidygenomics

CRAN_Status_Badge

Tidy Verbs for Dealing with Genomic Data Frames

Description

Handle genomic data within data frames just as you would with GRanges. This packages provides method to deal with genomics intervals the "tidy-way" which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular bedtools and the genome_join() method from the fuzzyjoin package.

Installation

install.packages("tidygenomics")

Or to get the latest development version

devtools::install_github("const-ae/tidygenomics")

Documentation

genome_intersect

Joins 2 data frames based on their genomic overlap. Unlike the genome_join function it updates the boundaries to reflect the overlap of the regions.

genome_intersect

x1 <- data.frame(id = 1:4, 
                chromosome = c("chr1", "chr1", "chr2", "chr2"),
                start = c(100, 200, 300, 400),
                end = c(150, 250, 350, 450))

x2 <- data.frame(id = 1:4,
                 chromosome = c("chr1", "chr2", "chr2", "chr1"),
                 start = c(140, 210, 400, 300),
                 end = c(160, 240, 415, 320))

genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
id.x chromosome id.y start end
1 chr1 1 140 150
4 chr2 3 400 415

genome_subtract

Subtracts one data frame from the other. This can be used to split the x data frame into smaller areas.

genome_subtract

x1 <- data.frame(id = 1:4,
                chromosome = c("chr1", "chr1", "chr2", "chr1"),
                start = c(100, 200, 300, 400),
                end = c(150, 250, 350, 450))

x2 <- data.frame(id = 1:4,
                chromosome = c("chr1", "chr2", "chr1", "chr1"),
                start = c(120, 210, 300, 400),
                end = c(125, 240, 320, 415))

genome_subtract(x1, x2, by=c("chromosome", "start", "end"))
id chromosome start end
1 chr1 100 119
1 chr1 126 150
2 chr1 200 250
3 chr2 300 350
4 chr1 416 450

genome_join_closest

Joins 2 data frames based on their genomic location. If no exact overlap is found the next closest interval is used.

genome_join_closest

x1 <- data_frame(id = 1:4, 
                 chr = c("chr1", "chr1", "chr2", "chr3"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))

x2 <- data_frame(id = 1:4,
                 chr = c("chr1", "chr1", "chr1", "chr2"),
                 start = c(220, 210, 300, 400),
                 end = c(225, 240, 320, 415))
genome_join_closest(x1, x2, by=c("chr", "start", "end"), distance_column_name="distance", mode="left")
id.x chr.x start.x end.x id.y chr.y start.y end.y distance
1 chr1 100 150 2 chr1 210 240 59
2 chr1 200 250 1 chr1 220 225 0
2 chr1 200 250 2 chr1 210 240 0
3 chr2 300 350 4 chr2 400 415 49
4 chr3 400 450 NA NA NA NA NA

genome_cluster

Add a new column with the cluster if 2 intervals are overlapping or are within the max_distance.

genome_cluster

x1 <- data.frame(id = 1:4, bla=letters[1:4],
                chromosome = c("chr1", "chr1", "chr2", "chr1"),
                start = c(100, 120, 300, 260),
                end = c(150, 250, 350, 450))
genome_cluster(x1, by=c("chromosome", "start", "end"))
id bla chromosome start end cluster_id
1 a chr1 100 150 0
2 b chr1 120 250 0
3 c chr2 300 350 2
4 d chr1 260 450 1
genome_cluster(x1, by=c("chromosome", "start", "end"), max_distance=10)
id bla chromosome start end cluster_id
1 a chr1 100 150 0
2 b chr1 120 250 0
3 c chr2 300 350 1
4 d chr1 260 450 0

genome_complement

Calculates the complement of a genomic region.

genome_complement

x1 <- data.frame(id = 1:4,
                 chromosome = c("chr1", "chr1", "chr2", "chr1"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))

genome_complement(x1, by=c("chromosome", "start", "end"))
chromosome start end
chr1 1 99
chr1 151 199
chr1 251 399
chr2 1 299

genome_join

Classical join function based on the overlap of the interval. Implemented and maintained in the fuzzyjoin package and documented here only for completeness.

genome_join

x1 <- data_frame(id = 1:4, 
                 chr = c("chr1", "chr1", "chr2", "chr3"),
                 start = c(100, 200, 300, 400),
                 end = c(150, 250, 350, 450))

x2 <- data_frame(id = 1:4,
                 chr = c("chr1", "chr1", "chr1", "chr2"),
                 start = c(220, 210, 300, 400),
                 end = c(225, 240, 320, 415))
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="inner")
id.x chr.x start.x end.x id.y chr.y start.y end.y
2 chr1 200 250 1 chr1 220 225
2 chr1 200 250 2 chr1 210 240
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="left")
id.x chr.x start.x end.x id.y chr.y start.y end.y
1 chr1 100 150 NA NA NA NA
2 chr1 200 250 1 chr1 220 225
2 chr1 200 250 2 chr1 210 240
3 chr2 300 350 NA NA NA NA
4 chr3 400 450 NA NA NA NA
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="anti")
id chr start end
1 chr1 100 150
3 chr2 300 350
4 chr3 400 450

Inspiration

If you have any additional questions or encounter issues please raise them on the github page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].