dillondaudert / UMAP.jl

Licence: MIT License

Uniform Manifold Approximation and Projection (UMAP) implementation in Julia

Programming Languages

julia

2034 projects

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to UMAP.jl

Umap

Uniform Manifold Approximation and Projection

Stars: ✭ 5,268 (+5564.52%)

Mutual labels: dimensionality-reduction, umap, topological-data-analysis

dbMAP

A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.

Stars: ✭ 39 (-58.06%)

Mutual labels: dimensionality-reduction, umap

topometry

A comprehensive dimensional reduction framework to recover the latent topology from high-dimensional data.

Stars: ✭ 64 (-31.18%)

Mutual labels: dimensionality-reduction, topological-data-analysis

Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

Stars: ✭ 34 (-63.44%)

Mutual labels: dimensionality-reduction, umap

ReductionWrappers

R wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)

Stars: ✭ 31 (-66.67%)

Mutual labels: dimensionality-reduction, umap

ParametricUMAP paper

Parametric UMAP embeddings for representation and semisupervised learning. From the paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" (Sainburg, McInnes, Gentner, 2020).

Stars: ✭ 132 (+41.94%)

Mutual labels: dimensionality-reduction, umap

50-days-of-Statistics-for-Data-Science

This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.

Stars: ✭ 19 (-79.57%)

Mutual labels: dimensionality-reduction

tadasets

Synthetic data sets apt for Topological Data Analysis

Stars: ✭ 20 (-78.49%)

Mutual labels: topological-data-analysis

lfda

Local Fisher Discriminant Analysis in R

Stars: ✭ 74 (-20.43%)

Mutual labels: dimensionality-reduction

umap-java

A Uniform Manifold Approximation and Projection (UMAP) library for Java, developed by Tag.bio in collaboration with Real Time Genomics.

Stars: ✭ 16 (-82.8%)

Mutual labels: umap

Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.

Stars: ✭ 31 (-66.67%)

Mutual labels: dimensionality-reduction

timecorr

Estimate dynamic high-order correlations in multivariate timeseries data

Stars: ✭ 30 (-67.74%)

Mutual labels: dimensionality-reduction

TDAstats

R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.

Stars: ✭ 26 (-72.04%)

Mutual labels: topological-data-analysis

tmap

topological data analysis of population-scale microbiomes

Stars: ✭ 24 (-74.19%)

Mutual labels: topological-data-analysis

pymde

Minimum-distortion embedding with PyTorch

Stars: ✭ 420 (+351.61%)

Mutual labels: dimensionality-reduction

Dimensionality-reduction-and-classification-on-Hyperspectral-Images-Using-Python

In this repository, You can find the files which implement dimensionality reduction on the hyperspectral image(Indian Pines) with classification.

Stars: ✭ 63 (-32.26%)

Mutual labels: dimensionality-reduction

Interactive-3D-Plotting-in-Seurat-3.0.0

This repository contains R code, with which you can create 3D UMAP and tSNE plots of Seurat analyzed scRNAseq data

Stars: ✭ 80 (-13.98%)

Mutual labels: umap

Embeddings2Image

create "Karpathy's style" 2d images out of your image embeddings

Stars: ✭ 52 (-44.09%)

Mutual labels: umap

scHPF

Single-cell Hierarchical Poisson Factorization

Stars: ✭ 52 (-44.09%)

Mutual labels: dimensionality-reduction

antz

ANTz immersive 3D data visualization engine

Stars: ✭ 25 (-73.12%)

Mutual labels: topological-data-analysis

View All Similar Projects ➔

UMAP.jl

A pure Julia implementation of the Uniform Manifold Approximation and Projection dimension reduction algorithm

McInnes, L, Healy, J, Melville, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiV 1802.03426, 2018

Usage

embedding = umap(X, n_components; n_neighbors, metric, min_dist, ...)

The umap function takes two arguments, X (a column-major matrix of shape (n_features, n_samples)), n_components (the number of dimensions in the output embedding), and various keyword arguments. Several important ones are:

n_neighbors::Int=15: This controls how many neighbors around each point are considered to be part of its local neighborhood. Larger values will result in embeddings that capture more global structure, while smaller values will preserve more local structures.
metric::SemiMetric=Euclidean(): The (semi)metric to use when calculating distances between points. This can be any subtype of the SemiMetric type from the Distances.jl package, including user-defined types.
min_dist::Float=0.1: This controls the minimum spacing of points in the embedding. Larger values will cause points to be more evenly distributed, while smaller values will preserve more local structure.

The returned embedding will be a matrix of shape (n_components, n_samples).

Using precomputed distances

UMAP can use a precomputed distance matrix instead of finding the nearest neighbors itself. In this case, the distance matrix is passed as X and the metric keyword argument should be :precomputed. Example:

embedding = umap(distances, n_components; metric=:precomputed)

Fitting a UMAP model to a dataset and transforming new data

Constructing a model

To construct a model to use for embedding new data, use the constructor:

model = UMAP_(X, n_components; <kwargs>)

where the constructor takes the same keyword arguments (kwargs) as umap. The returned object has the following fields:

model.graph     # The graph of fuzzy simplicial set membership strengths of each point in the dataset
model.embedding # The embedding of the dataset
model.data      # A reference to the original dataset
model.knns      # A matrix of indices of nearest neighbors of points in the dataset,
                # as determined on the original manifold (may be approximate)
model.dists     # The distances of the neighbors indicated by model.knns

Embedding new data

To transform new data into the existing embedding of a UMAP model, use the transform function:

Q_embedding = transform(model, Q; <kwargs>)

where Q is a matrix of new query data to embed into the existing embedding, and model is the object obtained from the UMAP_ call above. Q must come from a space of the same dimensionality as model.data (ie X in the UMAP_ call above).

The remaining keyword arguments (kwargs) are the same as for above functions.

Implementation Details

There are two main steps involved in UMAP: building a weighted graph with edges connecting points to their nearest neighbors, and optimizing the low-dimensional embedding of that graph. The first step is accomplished either by an exact kNN search (for datasets with < 4096 points) or by the approximate kNN search algorithm, NNDescent. This step is also usually the most costly.

The low-dimensional embedding is initialized (by default) with the eigenvectors of the normalized Laplacian of the kNN graph. These are found using ARPACK (via Arpack.jl).

Current Limitations

Input data types: Only data points that are represented by vectors of numbers (passed in as a matrix) are valid inputs. This is mostly due to a lack of support for other formats in NNDescent. Support for e.g. string datasets is possible in the future
Sequential: This implementation does not take advantage of any parallelism

External Resources

Understanding UMAP
For a great description of how UMAP works, see this page from the Python UMAP documentation
If you're familiar with t-SNE, then this page describes UMAP with similar vocabulary to that dimension reduction algorithm

Examples

The full MNIST and FMNIST datasets are plotted below using both this implementation and the Python implementation for comparison. These were generated by this notebook.

Note that the memory allocation for the Python UMAP is unreliable, as Julia's benchmarking doesn't count memory allocated within Python itself.

MNIST

FMNIST

Disclaimer

This implementation is a work-in-progress. If you encounter any issues, please create an issue or make a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dillondaudert / UMAP.jl

Programming Languages

Labels

Projects that are alternatives of or similar to UMAP.jl

UMAP.jl

Usage

Using precomputed distances

Fitting a UMAP model to a dataset and transforming new data

Constructing a model

Embedding new data

Implementation Details

Current Limitations

External Resources

Examples

MNIST

FMNIST

Disclaimer