All Projects → terrytangyuan → dml

terrytangyuan / dml

Licence: MIT License
R package for Distance Metric Learning

Programming Languages

r
7636 projects
TeX
3793 projects

Projects that are alternatives of or similar to dml

lfda
Local Fisher Discriminant Analysis in R
Stars: ✭ 74 (+27.59%)
Mutual labels:  dimensionality-reduction, metric-learning, distance-metric-learning
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+3687.93%)
Mutual labels:  statistics, dimensionality-reduction
Dimensionality-reduction-and-classification-on-Hyperspectral-Images-Using-Python
In this repository, You can find the files which implement dimensionality reduction on the hyperspectral image(Indian Pines) with classification.
Stars: ✭ 63 (+8.62%)
Mutual labels:  dimensionality-reduction
tics
🎢 Simple self-hosted analytics ideal for Express / React Native stacks
Stars: ✭ 22 (-62.07%)
Mutual labels:  statistics
FantasyPremierLeague.py
⚽ Statistics for your mini leagues.
Stars: ✭ 123 (+112.07%)
Mutual labels:  statistics
bhtsne
Parallel Barnes-Hut t-SNE implementation written in Rust.
Stars: ✭ 43 (-25.86%)
Mutual labels:  dimensionality-reduction
awesome-datascience-python
Awesome list Data Science and Python. 🐍
Stars: ✭ 62 (+6.9%)
Mutual labels:  statistics
adenine
ADENINE: A Data ExploratioN PipelINE
Stars: ✭ 15 (-74.14%)
Mutual labels:  dimensionality-reduction
HeroesMatchTracker
Heroes of the Storm match tracker for personal statistics
Stars: ✭ 59 (+1.72%)
Mutual labels:  statistics
gitstats
simple statistical analysis tool for git repositories
Stars: ✭ 16 (-72.41%)
Mutual labels:  statistics
Expectations.jl
Expectation operators for Distributions.jl objects
Stars: ✭ 50 (-13.79%)
Mutual labels:  statistics
btsa
Berlin Time Series Analysis Repository
Stars: ✭ 60 (+3.45%)
Mutual labels:  statistics
forestError
A Unified Framework for Random Forest Prediction Error Estimation
Stars: ✭ 23 (-60.34%)
Mutual labels:  statistics
wrapperr
Website and API that collects Plex statistics using Tautulli and displays it. Similar to the Spotify Wrapped concept.
Stars: ✭ 93 (+60.34%)
Mutual labels:  statistics
Spectre
A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
Stars: ✭ 31 (-46.55%)
Mutual labels:  dimensionality-reduction
Data-Science-and-Machine-Learning-Resources
List of Data Science and Machine Learning Resource that I frequently use
Stars: ✭ 19 (-67.24%)
Mutual labels:  statistics
timecorr
Estimate dynamic high-order correlations in multivariate timeseries data
Stars: ✭ 30 (-48.28%)
Mutual labels:  dimensionality-reduction
stats for soil survey
S4SS: Statistics for Soil Survey
Stars: ✭ 21 (-63.79%)
Mutual labels:  statistics
hdfe
No description or website provided.
Stars: ✭ 22 (-62.07%)
Mutual labels:  statistics
Algorithmic-Trading
I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.
Stars: ✭ 47 (-18.97%)
Mutual labels:  statistics

JOSS DOI CRAN Status Travis-CI Build Status Coverage Status Downloads from the RStudio CRAN mirror License Zenodo DOI

dml (Distance Metric Learning in R)

R package for a collection of Distance Metric Learning algorithms, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.

Installation

Install the current release from CRAN:

install.packages("dml")

Or, try the latest development version from GitHub:

devtools::install_github("terrytangyuan/dml")

Examples

Relevant Component Analysis

library("MASS")

# generate synthetic multivariate normal data
set.seed(42)

k <- 100L # sample size of each class
n <- 3L # specify how many classes
N <- k * n # total sample size

x1 <- mvrnorm(k, mu = c(-16, 8), matrix(c(15, 1, 2, 10), ncol = 2))
x2 <- mvrnorm(k, mu = c(0, 0), matrix(c(15, 1, 2, 10), ncol = 2))
x3 <- mvrnorm(k, mu = c(16, -8), matrix(c(15, 1, 2, 10), ncol = 2))
x <- as.data.frame(rbind(x1, x2, x3)) # predictors
y <- gl(n, k) # response

# fully labeled data set with 3 classes
# need to use a line in 2D to classify
plot(x[, 1L], x[, 2L],
  bg = c("#E41A1C", "#377EB8", "#4DAF4A")[y],
  pch = rep(c(22, 21, 25), each = k)
)
abline(a = -10, b = 1, lty = 2)
abline(a = 12, b = 1, lty = 2)

# generate synthetic chunklets
chunks <- vector("list", 300)
for (i in 1:100) chunks[[i]] <- sample(1L:100L, 10L)
for (i in 101:200) chunks[[i]] <- sample(101L:200L, 10L)
for (i in 201:300) chunks[[i]] <- sample(201L:300L, 10L)

chks <- x[unlist(chunks), ]

# make "chunklet" vector to feed the chunks argument
chunksvec <- rep(-1L, nrow(x))
for (i in 1L:length(chunks)) {
  for (j in 1L:length(chunks[[i]])) {
    chunksvec[chunks[[i]][j]] <- i
  }
}

# relevant component analysis
rcs <- rca(x, chunksvec)

# learned transformation of the data
rcs$A
#>           [,1]       [,2]
#> [1,] -3.181484 -0.8812647
#> [2,] -1.196200  2.3438640

# learned Mahalanobis distance metric
rcs$B
#>           [,1]     [,2]
#> [1,] 10.898467 1.740125
#> [2,]  1.740125 6.924592

# whitening transformation applied to the chunklets
chkTransformed <- as.matrix(chks) %*% rcs$A

# original data after applying RCA transformation
# easier to classify - using only horizontal lines
xnew <- rcs$newX
plot(xnew[, 1L], xnew[, 2L],
  bg = c("#E41A1C", "#377EB8", "#4DAF4A")[gl(n, k)],
  pch = c(rep(22, k), rep(21, k), rep(25, k))
)
abline(a = -15, b = 0, lty = 2)
abline(a = 16, b = 0, lty = 2)

Other Examples

For examples of Local Fisher Discriminant Analysis, please take a look at the separate package here. For examples of all other implemented algorithms, please take a look at the dml package reference manual.

Brief Introduction

Distance metric is widely used in the machine learning literature. We used to choose a distance metric according to a priori (Euclidean Distance , L1 Distance, etc.) or according to the result of cross validation within small class of functions (e.g. choosing order of polynomial for a kernel). Actually, with priori knowledge of the data, we could learn a more suitable distance metric with (semi-)supervised distance metric learning techniques. dml is such an R package aims to implement a collection of algorithms for (semi-)supervised distance metric learning. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.

Algorithms

Algorithms planned in the first development stage:

  • Supervised Global Distance Metric Learning:

    • Relevant Component Analysis (RCA) - implemented
    • Kernel Relevant Component Analysis (KRCA)
    • Discriminative Component Analysis (DCA) - implemented
    • Kernel Discriminative Component Analysis (KDCA)
    • Global Distance Metric Learning by Convex Programming - implemented
  • Supervised Local Distance Metric Learning:

    • Local Fisher Discriminant Analysis - implemented
    • Kernel Local Fisher Discriminant Analysis - implemented
    • Information-Theoretic Metric Learning (ITML)
    • Large Margin Nearest Neighbor Classifier (LMNN)
    • Neighbourhood Components Analysis (NCA)
    • Localized Distance Metric Learning (LDM)

The algorithms and routines might be adjusted during developing.

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contact

Contact the maintainer of this package: Yuan Tang [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].