All Projects → tag-bio → umap-java

tag-bio / umap-java

Licence: BSD-3-Clause license
A Uniform Manifold Approximation and Projection (UMAP) library for Java, developed by Tag.bio in collaboration with Real Time Genomics.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to umap-java

AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (+143.75%)
Mutual labels:  umap
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (+231.25%)
Mutual labels:  umap
BEER
BEER: Batch EffEct Remover for single-cell data
Stars: ✭ 19 (+18.75%)
Mutual labels:  umap
ParametricUMAP paper
Parametric UMAP embeddings for representation and semisupervised learning. From the paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" (Sainburg, McInnes, Gentner, 2020).
Stars: ✭ 132 (+725%)
Mutual labels:  umap
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+237.5%)
Mutual labels:  umap
Umap
Uniform Manifold Approximation and Projection
Stars: ✭ 5,268 (+32825%)
Mutual labels:  umap
Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
Stars: ✭ 34 (+112.5%)
Mutual labels:  umap
dbMAP
A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.
Stars: ✭ 39 (+143.75%)
Mutual labels:  umap
UMAP.jl
Uniform Manifold Approximation and Projection (UMAP) implementation in Julia
Stars: ✭ 93 (+481.25%)
Mutual labels:  umap
Embeddings2Image
create "Karpathy's style" 2d images out of your image embeddings
Stars: ✭ 52 (+225%)
Mutual labels:  umap
ReductionWrappers
R wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)
Stars: ✭ 31 (+93.75%)
Mutual labels:  umap
Interactive-3D-Plotting-in-Seurat-3.0.0
This repository contains R code, with which you can create 3D UMAP and tSNE plots of Seurat analyzed scRNAseq data
Stars: ✭ 80 (+400%)
Mutual labels:  umap

Java UMAP

A self-contained native Java implementation of UMAP based on the reference Python implementation.

This implementation has been designed and developed by Tag.bio in collaboration with Real Time Genomics.

Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data:

  1. The data is uniformly distributed on a Riemannian manifold;
  2. The Riemannian metric is locally constant (or can be approximated as such);
  3. The manifold is locally connected.

From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.

The details for the underlying mathematics and algorithms can be found in L. McInnes and J. Healy, J. Melville, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.

How to use UMAP

Using Java UMAP is simple:

final float[][] data = ...           // input data instances * attributes
final Umap umap = new Umap();
umap.setNumberComponents(2);         // number of dimensions in result
umap.setNumberNearestNeighbours(15);
umap.setThreads(1);                  // use > 1 to enable parallelism
final float[][] result = umap.fitTransform(data);

There are a large number of potential parameters than can be set; the major ones are as follows:

  • setNumberNearestNeighbours: This determines the number of neighboring points used in local approximations of manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50, with a choice of 10 to 15 being a sensible default.

  • setMinDist: This controls how tightly the embedding is allowed compress points together. Larger values ensure embedded points are more evenly distributed, while smaller values allow the algorithm to optimize more accurately with regard to local structure. Sensible values are in the range 0.001 to 0.5, with 0.1 being a reasonable default.

  • setMetric: This determines the choice of metric used to measure distance in the input space. Default to a Euclidean metric.

In addition the number of threads to use can be set with setThreads. If this is set to a number greater than 1, then the results will no longer be deterministic, even for a specified random number seed.

Limitations

This Java implementation has a number of limitations when compared to the reference Python implementation:

  • Only the random initialization mode is currently supported. In particular, spectral initialization is not currently supported.

  • The transform() method for adding new points to an existing embedding is implemented, but should be considered alpha.

  • Selection of curve parameters is more limited than in the Python version (an IllegalArgumentException will be reported if limits are exceeded).

  • Other limitations might occur as an UnsupportedOperationException.

Citation

If you would like to cite this algorithm in your work the ArXiv paper is the current reference:

   @article{2018arXivUMAP,
        author = {{McInnes}, L. and {Healy}, J. and {Melville}, J.},
        title = "{UMAP: Uniform Manifold Approximation
        and Projection for Dimension Reduction}",
        journal = {ArXiv e-prints},
        archivePrefix = "arXiv",
        eprint = {1802.03426},
        primaryClass = "stat.ML",
        keywords = {Statistics - Machine Learning,
                    Computer Science - Computational Geometry,
                    Computer Science - Learning},
        year = 2018,
        month = feb,
   }

License

The Java UMAP package is 3-clause BSD licensed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].