thierrygosselin / assigner

Licence: GPL-3.0 license

Population assignment analysis using R

Programming Languages

7636 projects

Projects that are alternatives of or similar to assigner

Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.

Stars: ✭ 16 (-5.88%)

Mutual labels: gbs, radseq

cerebra

A tool for fast and accurate summarizing of variant calling format (VCF) files

Stars: ✭ 55 (+223.53%)

Mutual labels: genomics

Canvasxpress

JavaScript VisualizationTools

Stars: ✭ 247 (+1352.94%)

Mutual labels: genomics

Mitty

Seven Bridges Genomics aligner/caller debugging and analysis tools

Stars: ✭ 13 (-23.53%)

Mutual labels: genomics

cljam

A DNA Sequence Alignment/Map (SAM) library for Clojure

Stars: ✭ 85 (+400%)

Mutual labels: genomics

berokka

🍊 💫 Trim, circularise and orient long read bacterial genome assemblies

Stars: ✭ 23 (+35.29%)

Mutual labels: genomics

Cyvcf2

cython + htslib == fast VCF and BCF processing

Stars: ✭ 243 (+1329.41%)

Mutual labels: genomics

aws-genomics-workflows

Genomics Workflows on AWS

Stars: ✭ 131 (+670.59%)

Mutual labels: genomics

GenomicsDB

Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.

Stars: ✭ 77 (+352.94%)

Mutual labels: genomics

R-Learning-Journey

Some of the projects i made when starting to learn R for Data Science at the university

Stars: ✭ 19 (+11.76%)

Mutual labels: datascience

HLA

xHLA: Fast and accurate HLA typing from short read sequence data

Stars: ✭ 84 (+394.12%)

Mutual labels: genomics

ml-book

Codice sorgente ed Errata Corrige del mio libro "A tu per tu col Machine Learning"

Stars: ✭ 16 (-5.88%)

Mutual labels: datascience

MGSE

Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.

Stars: ✭ 22 (+29.41%)

Mutual labels: genomics

Hap.py

Haplotype VCF comparison tools

Stars: ✭ 249 (+1364.71%)

Mutual labels: genomics

ML-DS-Guide

Complied Resources for learning Machine Learning & Data Science

Stars: ✭ 42 (+147.06%)

Mutual labels: datascience

Biopython

Official git repository for Biopython (originally converted from CVS)

Stars: ✭ 2,936 (+17170.59%)

Mutual labels: genomics

Data-Science

Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.

Stars: ✭ 15 (-11.76%)

Mutual labels: datascience

kmer-db

Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).

Stars: ✭ 68 (+300%)

Mutual labels: genomics

sequencework

programs and scripts, mainly python, for analyses related to nucleic or protein sequences

Stars: ✭ 22 (+29.41%)

Mutual labels: genomics

metaRNA

Find target sites for the miRNAs in genomic sequences

Stars: ✭ 19 (+11.76%)

Mutual labels: genomics

View All Similar Projects ➔

assigner

The name assigner |əˈsʌɪn| is rooted in the Latin word assignare. Its first use in French dates back to XIIIe.

For the logo, I was inspired by the Northern Atlantic Octupus. I was fortunate to spend a lot of times with one during my PhD. These incredible creatures have 8 arms and thousands of suckers that they can control independently. Octopus are really the best multitaskers. The logo was designed by the artist Claude Thivierge.

Genomic datasets produced by next-generation sequencing techniques that reduce the size of the genome (e.g. genotype-by-sequencing (GBS) and restriction-site-associated DNA sequencing (RADseq)) have a huge number of markers that hold great potential and promises for assignment analysis. After hitting the bioinformatic wall with the different workflow, you’ll likely end up with several folders containing whitelist and blacklist of markers and individuals, data sets with various de novo and/or filtering parameters and … missing data. This reality of GBS/RADseq data is quite hard on GUI software traditionally used for population assignment analysis. The end results are usually poor data exploration, constrained by time, and poor reproducibility.

assigner was tailored to make it easy to conduct population assignment analysis using GBS/RADseq data within R. Additionally, combining the use of tools like R Notebook, RStudio and GitHub will make effortless documenting your workflows and pipelines.

The keywords here to remember:

3 differents algorithms implemented: frequentist, likelihood and machine learning
cross-validation techniques: classic Leave-One-Out (LOO) and Training, Holdout, Leave-one-out (THL) with marker selection
resampling/bootstrap/subsampling
fast Fst WC84 implementation)
ggplot2-based plotting!
https://thierrygosselin.github.io/assigner/

Installation

To try out the dev version of assigner:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("thierrygosselin/assigner")
library(assigner)

If you plan on using gsi_sim inside assigner, you need an additional step:

With UNIX

assigner::install_gsi_sim(fromSource = TRUE)

With PC

assigner::install_gsi_sim()

web site and additional info: https://thierrygosselin.github.io/assigner/
Computer setup - installation - troubleshooting
assigner’s assumptions
assigner’s features
Function’s documentation
Vignettes
How to cite assigner: inside R type citation("assigner")

Life cycle

assigner is maturing, but in order to make the package better, changes are inevitable. Experimental functions will change, argument names will change. Your codes and workflows might break from time to time until assigner is stable. Consequently, depending on your tolerance to change, assigner might not be for you.

Philosophy, major changes and deprecated functions/arguments are documented in life cycle section of functions.
The latest changes are documented (here) and in changelog, versions, new features and bug history
issues and contributions

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

thierrygosselin / assigner

Programming Languages

Labels

Projects that are alternatives of or similar to assigner

assigner

Installation

Life cycle