All Projects → cvxgrp → pymde

cvxgrp / pymde

Licence: Apache-2.0 license
Minimum-distortion embedding with PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pymde

walklets
A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).
Stars: ✭ 94 (-77.62%)
Mutual labels:  dimensionality-reduction, embedding, graph-embedding
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+346.19%)
Mutual labels:  dimensionality-reduction, embedding
OpenANE
OpenANE: the first Open source framework specialized in Attributed Network Embedding. The related paper was accepted by Neurocomputing. https://doi.org/10.1016/j.neucom.2020.05.080
Stars: ✭ 39 (-90.71%)
Mutual labels:  embedding, graph-embedding
FSCNMF
An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".
Stars: ✭ 16 (-96.19%)
Mutual labels:  embedding, graph-embedding
RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (-87.62%)
Mutual labels:  embedding, graph-embedding
AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (-90.71%)
Mutual labels:  embedding
tldr
TLDR is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self-supervised learning losses
Stars: ✭ 95 (-77.38%)
Mutual labels:  dimensionality-reduction
ParametricUMAP paper
Parametric UMAP embeddings for representation and semisupervised learning. From the paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" (Sainburg, McInnes, Gentner, 2020).
Stars: ✭ 132 (-68.57%)
Mutual labels:  dimensionality-reduction
exembed
Go Embed experiments
Stars: ✭ 27 (-93.57%)
Mutual labels:  embedding
nodebb-plugin-ns-embed
Embed media and rich content in posts: YouTube, Vimeo, Twitch and more.
Stars: ✭ 27 (-93.57%)
Mutual labels:  embedding
lfda
Local Fisher Discriminant Analysis in R
Stars: ✭ 74 (-82.38%)
Mutual labels:  dimensionality-reduction
enstop
Ensemble topic modelling with pLSA
Stars: ✭ 104 (-75.24%)
Mutual labels:  dimensionality-reduction
BERT-embedding
A simple wrapper class for extracting features(embedding) and comparing them using BERT in TensorFlow
Stars: ✭ 24 (-94.29%)
Mutual labels:  embedding
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (-87.38%)
Mutual labels:  embedding
scHPF
Single-cell Hierarchical Poisson Factorization
Stars: ✭ 52 (-87.62%)
Mutual labels:  dimensionality-reduction
moses
Streaming, Memory-Limited, r-truncated SVD Revisited!
Stars: ✭ 19 (-95.48%)
Mutual labels:  dimensionality-reduction
topometry
A comprehensive dimensional reduction framework to recover the latent topology from high-dimensional data.
Stars: ✭ 64 (-84.76%)
Mutual labels:  dimensionality-reduction
ReductionWrappers
R wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)
Stars: ✭ 31 (-92.62%)
Mutual labels:  dimensionality-reduction
TriDNR
Tri-Party Deep Network Representation, IJCAI-16
Stars: ✭ 72 (-82.86%)
Mutual labels:  graph-embedding
50-days-of-Statistics-for-Data-Science
This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-95.48%)
Mutual labels:  dimensionality-reduction

PyMDE

PyPI version Conda Version

The official documentation for PyMDE is available at www.pymde.org.

This repository accompanies the monograph Minimum-Distortion Embedding.

PyMDE is a Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network, or any other abstract object.

What sets PyMDE apart from other embedding libraries is that it provides a simple but general framework for embedding, called Minimum-Distortion Embedding (MDE). With MDE, it is easy to recreate well-known embeddings and to create new ones, tailored to your particular application.

PyMDE is competitive in runtime with more specialized embedding methods. With a GPU, it can be even faster.

Overview

PyMDE can be enjoyed by beginners and experts alike. It can be used to:

  • visualize datasets, small or large;
  • generate feature vectors for supervised learning;
  • compress high-dimensional vector data;
  • draw graphs (in up to orders of magnitude less time than packages like NetworkX);
  • create custom embeddings, with custom objective functions and constraints (such as having uncorrelated feature columns);
  • and more.

PyMDE is very young software, under active development. If you run into issues, or have any feedback, please reach out by filing a Github issue.

This README gives a very brief overview of PyMDE. Make sure to read the official documentation at www.pymde.org, which has in-depth tutorials and API documentation.

Installation

PyMDE is available on the Python Package Index, and on Conda Forge.

To install with pip, use

pip install pymde

Alternatively, to install with conda, use

conda install -c pytorch -c conda-forge pymde

PyMDE has the following requirements:

  • Python >= 3.7
  • numpy >= 1.17.5
  • scipy
  • torch >= 1.7.1
  • torchvision >= 0.8.2
  • pynndescent
  • requests

Getting started

Getting started with PyMDE is easy. For embeddings that work out-of-the box, we provide two main functions:

pymde.preserve_neighbors

which preserves the local structure of original data, and

pymde.preserve_distances

which preserves pairwise distances or dissimilarity scores in the original data.

Arguments. The input to these functions is the original data, represented either as a data matrix in which each row is a feature vector, or as a (possibly sparse) graph encoding pairwise distances. The embedding dimension is specified by the embedding_dim keyword argument, which is 2 by default.

Return value. The return value is an MDE object. Calling the embed() method on this object returns an embedding, which is a matrix (torch.Tensor) in which each row is an embedding vector. For example, if the original input is a data matrix of shape (n_items, n_features), then the embedding matrix has shape (n_items, embeddimg_dim).

We give examples of using these functions below.

Preserving neighbors

The following code produces an embedding of the MNIST dataset (images of handwritten digits), in a fashion similar to LargeVis, t-SNE, UMAP, and other neighborhood-based embeddings. The original data is a matrix of shape (70000, 784), with each row representing an image.

import pymde

mnist = pymde.datasets.MNIST()
embedding = pymde.preserve_neighbors(mnist.data, verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

Unlike most other embedding methods, PyMDE can compute embeddings that satisfy constraints. For example:

embedding = pymde.preserve_neighbors(mnist.data, constraint=pymde.Standardized(), verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

The standardization constraint enforces the embedding vectors to be centered and have uncorrelated features.

Preserving distances

The function pymde.preserve_distances is useful when you're more interested in preserving the gross global structure instead of local structure.

Here's an example that produces an embedding of an academic coauthorship network, from Google Scholar. The original data is a sparse graph on roughly 40,000 authors, with an edge between authors who have collaborated on at least one paper.

import pymde

google_scholar = pymde.datasets.google_scholar()
embedding = pymde.preserve_distances(google_scholar.data, verbose=True).embed()
pymde.plot(embedding, color_by=google_scholar.attributes['coauthors'], color_map='viridis', background_color='black')

More collaborative authors are colored brighter, and are near the center of the embedding.

Example notebooks

We have several example notebooks that show how to use PyMDE on real (and synthetic) datasets.

Citing

To cite our work, please use the following BibTex entry.

@article{agrawal2021minimum,
  author  = {Agrawal, Akshay and Ali, Alnur and Boyd, Stephen},
  title   = {Minimum-Distortion Embedding},
  journal = {arXiv},
  year    = {2021},
}

PyMDE was designed and developed by Akshay Agrawal.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].