All Projects → thierrygosselin → assigner

thierrygosselin / assigner

Licence: GPL-3.0 license
Population assignment analysis using R

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to assigner

assignPOP
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Stars: ✭ 16 (-5.88%)
Mutual labels:  gbs, radseq
cerebra
A tool for fast and accurate summarizing of variant calling format (VCF) files
Stars: ✭ 55 (+223.53%)
Mutual labels:  genomics
Canvasxpress
JavaScript VisualizationTools
Stars: ✭ 247 (+1352.94%)
Mutual labels:  genomics
Mitty
Seven Bridges Genomics aligner/caller debugging and analysis tools
Stars: ✭ 13 (-23.53%)
Mutual labels:  genomics
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+400%)
Mutual labels:  genomics
berokka
🍊 💫 Trim, circularise and orient long read bacterial genome assemblies
Stars: ✭ 23 (+35.29%)
Mutual labels:  genomics
Cyvcf2
cython + htslib == fast VCF and BCF processing
Stars: ✭ 243 (+1329.41%)
Mutual labels:  genomics
aws-genomics-workflows
Genomics Workflows on AWS
Stars: ✭ 131 (+670.59%)
Mutual labels:  genomics
GenomicsDB
Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.
Stars: ✭ 77 (+352.94%)
Mutual labels:  genomics
R-Learning-Journey
Some of the projects i made when starting to learn R for Data Science at the university
Stars: ✭ 19 (+11.76%)
Mutual labels:  datascience
HLA
xHLA: Fast and accurate HLA typing from short read sequence data
Stars: ✭ 84 (+394.12%)
Mutual labels:  genomics
ml-book
Codice sorgente ed Errata Corrige del mio libro "A tu per tu col Machine Learning"
Stars: ✭ 16 (-5.88%)
Mutual labels:  datascience
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (+29.41%)
Mutual labels:  genomics
Hap.py
Haplotype VCF comparison tools
Stars: ✭ 249 (+1364.71%)
Mutual labels:  genomics
ML-DS-Guide
Complied Resources for learning Machine Learning & Data Science
Stars: ✭ 42 (+147.06%)
Mutual labels:  datascience
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+17170.59%)
Mutual labels:  genomics
Data-Science
Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.
Stars: ✭ 15 (-11.76%)
Mutual labels:  datascience
kmer-db
Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
Stars: ✭ 68 (+300%)
Mutual labels:  genomics
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (+29.41%)
Mutual labels:  genomics
metaRNA
Find target sites for the miRNAs in genomic sequences
Stars: ✭ 19 (+11.76%)
Mutual labels:  genomics

assigner

lifecycle Project Status: Active – The project has reached a stable, usable state and is being actively developed. minimal R version packageversion Last-changedate R-CMD-check DOI

The name assigner |əˈsʌɪn| is rooted in the Latin word assignare. Its first use in French dates back to XIIIe.

For the logo, I was inspired by the Northern Atlantic Octupus. I was fortunate to spend a lot of times with one during my PhD. These incredible creatures have 8 arms and thousands of suckers that they can control independently. Octopus are really the best multitaskers. The logo was designed by the artist Claude Thivierge.

Genomic datasets produced by next-generation sequencing techniques that reduce the size of the genome (e.g. genotype-by-sequencing (GBS) and restriction-site-associated DNA sequencing (RADseq)) have a huge number of markers that hold great potential and promises for assignment analysis. After hitting the bioinformatic wall with the different workflow, you’ll likely end up with several folders containing whitelist and blacklist of markers and individuals, data sets with various de novo and/or filtering parameters and … missing data. This reality of GBS/RADseq data is quite hard on GUI software traditionally used for population assignment analysis. The end results are usually poor data exploration, constrained by time, and poor reproducibility.

assigner was tailored to make it easy to conduct population assignment analysis using GBS/RADseq data within R. Additionally, combining the use of tools like R Notebook, RStudio and GitHub will make effortless documenting your workflows and pipelines.

The keywords here to remember:

  • 3 differents algorithms implemented: frequentist, likelihood and machine learning
  • cross-validation techniques: classic Leave-One-Out (LOO) and Training, Holdout, Leave-one-out (THL) with marker selection
  • resampling/bootstrap/subsampling
  • fast Fst WC84 implementation)
  • ggplot2-based plotting!
  • https://thierrygosselin.github.io/assigner/

Installation

To try out the dev version of assigner:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("thierrygosselin/assigner")
library(assigner)

If you plan on using gsi_sim inside assigner, you need an additional step:

With UNIX

assigner::install_gsi_sim(fromSource = TRUE)

With PC

assigner::install_gsi_sim()

Life cycle

assigner is maturing, but in order to make the package better, changes are inevitable. Experimental functions will change, argument names will change. Your codes and workflows might break from time to time until assigner is stable. Consequently, depending on your tolerance to change, assigner might not be for you.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].