All Projects → alexkychen → assignPOP

alexkychen / assignPOP

Licence: GPL-3.0 license
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to assignPOP

assigner
Population assignment analysis using R
Stars: ✭ 17 (+6.25%)
Mutual labels:  gbs, radseq
PopGenome
An Efficient Swiss Army Knife for Population Genomic Analyses in R
Stars: ✭ 13 (-18.75%)
Mutual labels:  population-genomics
cosmosR
COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
Stars: ✭ 30 (+87.5%)
Mutual labels:  data-integration
sklearndf
DataFrame support for scikit-learn.
Stars: ✭ 54 (+237.5%)
Mutual labels:  cross-validation
PopDel
Population-wide Deletion Calling
Stars: ✭ 31 (+93.75%)
Mutual labels:  population-genomics
CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Stars: ✭ 22 (+37.5%)
Mutual labels:  data-integration
ostrich
A Game Boy Sound System player for macOS, written in Swift
Stars: ✭ 37 (+131.25%)
Mutual labels:  gbs
MNIST
Handwritten digit recognizer using a feed-forward neural network and the MNIST dataset of 70,000 human-labeled handwritten digits.
Stars: ✭ 28 (+75%)
Mutual labels:  cross-validation
mlr3spatiotempcv
Spatiotemporal resampling methods for mlr3
Stars: ✭ 43 (+168.75%)
Mutual labels:  cross-validation
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+381.25%)
Mutual labels:  data-integration
Mapeathor
Translator of spreadsheet mappings into R2RML, RML or YARRRML
Stars: ✭ 27 (+68.75%)
Mutual labels:  data-integration
glmnetUtils
Utilities for glmnet
Stars: ✭ 60 (+275%)
Mutual labels:  cross-validation
MiniGBS
Small .gbs chiptune player for Linux
Stars: ✭ 15 (-6.25%)
Mutual labels:  gbs
winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Stars: ✭ 101 (+531.25%)
Mutual labels:  data-integration
san
Spatial Modelling for Data Scientists
Stars: ✭ 63 (+293.75%)
Mutual labels:  cross-validation
CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Stars: ✭ 109 (+581.25%)
Mutual labels:  data-integration
OpenOmics
A bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases.
Stars: ✭ 22 (+37.5%)
Mutual labels:  data-integration
doctoral-thesis
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Stars: ✭ 26 (+62.5%)
Mutual labels:  data-integration
SchemaMapper
A .NET class library that allows you to import data from different sources into a unified destination
Stars: ✭ 41 (+156.25%)
Mutual labels:  data-integration
bio2bel
A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)
Stars: ✭ 16 (+0%)
Mutual labels:  data-integration

Travis-CI Build Status CRAN status GitHub release license

assignPOP

Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine-learning Framework

Description

This R package helps perform population assignment and infer population structure using a machine-learning framework. It employs supervised machine-learning methods to evaluate the discriminatory power of your data collected from source populations, and is able to analyze large genetic, non-genetic, or integrated (genetic plus non-genetic) data sets. This framework is designed for solving the upward bias issue discussed in previous studies. Main features are listed as follows.

  • Use principle component analysis (PCA) for dimensionality reduction (or data transformation)
  • Use Monte-Carlo cross-validation to estimate mean and variance of assignment accuracy
  • Use K-fold cross-validation to estimate membership probability
  • Allow to resample various sizes of training datasets (proportions or fixed numbers of individuals and proportions of loci)
  • Allow to choose from various proportions of training loci either randomly or based on locus Fst values
  • Provide several machine-learning classification algorithms, including LDA, SVM, naive Bayes, decision tree, and random forest, to build tunable predictive models.
  • Output results in publication-quality plots that can be modified using ggplot2 functions

Install assignPOP

You can install the released version from CRAN or the up-to-date version from this Github respository.

  • To install from CRAN

    • Simply enter install.packages("assignPOP") in your R console
  • To install from Github

    • step 1. Install devtools package by entering install.packages("devtools")
    • step 2. Import the library, library(devtools)
    • step 3. Then enter install_github("alexkychen/assignPOP")

Note: When you install the package from Github, you may need to install additional packages before the assignPOP can be successfully installed. Follow the hints that R provided and then re-run install_github("alexkychen/assignPOP").

Package tutorial

Please visit our tutorial website for more infomration

What's new

Changes in ver. 1.2.4 (2021.10.27)

  • Update membership.plot - add argument 'plot.k' and 'plot.loci' to skip related question prompt.
History

Changes in ver. 1.2.3 (2021.8.17)

  • Update assign.X - (1)Add argument 'common' to specify whether stopping the analysis when inconsistent features between data sets were found. (2)Add argument 'skipQ' to skip data type checking on non-genetic data. (3)Modify argument 'mplot' to handle membership probability plot output.

Changes in ver. 1.2.2 (2020.11.6)

  • Update read.Genepop and read.Structure - locus has only one allele across samples will be kept. Use reduce.allele to remove single-allele or low variance loci.
  • In ver. 1.2.1, errors might be generated when running assign.MC (and other assignment test functions) due to existence of single-allele loci. (fixed in ver. 1.2.2)

Changes in ver. 1.2.1 (2020.8.24)

  • Update read.Genepop to increase file reading speed (~40 times faster)
  • Update read.Structure to increase file reading speed (~90 times faster)
  • read.Structure now also can handle triploid and tetraploid organisms (see arg. ploidy)
  • fix bug in allele.reduce to handle small p threshold across all loci

Changes in ver. 1.2.0 (2020.7.24)

  • Add codes to check model name in assign.MC, assign.kfold, assign.X
  • Add text to SVM description
  • Fix cbind/stringsAsFactors issues in several places for R 4.0
  • Able to inject arugments used in models (e.g., gamma in SVM)

Changes in ver. 1.1.9 (2020.3.16)

  • Fix input non-genetic data (x1) error in assign.X

Changes in ver. 1.1.8 (2020.2.28)

  • update following functions to work with R 4.0.0
  • accuracy.MC, accuracy.kfold, assign.matrix, compile.data, membership.plot
  • add stringsAsFactor=T to read.table and read.csv
  • temporarily turn off testthat due to its current failure to pass test in Debian system

Changes in ver. 1.1.7 (2019.8.26)

  • add broken-stick method for principal component selection in assign.MC, assign.kfold, and assign.X functions
  • update accuracy.MC, accuracy.kfold, assign.matrix to handle missing levels of predicted population in test results
  • update assign. and accuracy. functions to handle numeric population names

Changes in ver. 1.1.6 (2019.6.8)

  • fix multiprocess issue in assign.kfold function

Changes in ver. 1.1.5 (2018.3.23)

  • Update assign.MC & assign.kfold to detect pop size and train.inds/k.fold setting
  • Update accuracy.MC & assign.matrix to handle test individuals not from every pop
  • Slightly modify levels method in accuracy.kfold
  • fix bugs in accuracy.plot for K-fold results
  • fix membership.plot title positioning and set text size to default

Changes in ver. 1.1.4 (2018.3.8)

  • Fix missing assign.matrix function

Changes in ver. 1.1.3 (2017.6.15)

  • Add unit tests (using package testthat)

Changes in ver. 1.1.2 (2017.5.13)

  • Change function name read.genpop to read.Genepop; Add function read.Structure.
  • Update read.genpop function, now can read haploid data

Cite this package

Chen, K. Y., Marschall, E. A., Sovic, M. G., Fries, A. C., Gibbs, H. L., & Ludsin, S. A. (2018). assign POP: An R package for population assignment using genetic, non-genetic, or integrated data in a machine-learning framework. Methods in Ecology and Evolution. 9(2)439-446. https://doi.org/10.1111/2041-210X.12897

Papers citing our package

Previous version

Previous packages can be found and downloaded at the releases page

Version compatibility (2020.7.24)

assignPOP version 1.1.9 and earlier are not fully compatible with newly released R 4.0.0. If you're using R 4.0.0 (or newer), please update your assignPOP to 1.2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].