All Projects → CannyLab → Tsne Cuda

CannyLab / Tsne Cuda

Licence: bsd-3-clause
GPU Accelerated t-SNE for CUDA with Python bindings

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tsne Cuda

Occa
JIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal
Stars: ✭ 230 (-79.46%)
Mutual labels:  multithreading, gpu, cuda
Graphvite
GraphVite: A General and High-performance Graph Embedding System
Stars: ✭ 865 (-22.77%)
Mutual labels:  gpu, cuda, data-visualization
Fast gicp
A collection of GICP-based fast point cloud registration algorithms
Stars: ✭ 307 (-72.59%)
Mutual labels:  multithreading, gpu, cuda
Ml Workspace
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Stars: ✭ 2,337 (+108.66%)
Mutual labels:  data-analysis, gpu, data-visualization
Heteroflow
Concurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (-94.91%)
Mutual labels:  multithreading, gpu, cuda
Autooffload.jl
Automatic GPU, TPU, FPGA, Xeon Phi, Multithreaded, Distributed, etc. offloading for scientific machine learning (SciML) and differential equations
Stars: ✭ 21 (-98.12%)
Mutual labels:  multithreading, gpu
Cuda
Experiments with CUDA and Rust
Stars: ✭ 31 (-97.23%)
Mutual labels:  gpu, cuda
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-96.87%)
Mutual labels:  data-analysis, data-visualization
Data Science Lunch And Learn
Resources for weekly Data Science Lunch & Learns
Stars: ✭ 49 (-95.62%)
Mutual labels:  data-analysis, data-visualization
Vectorbt
Ultimate Python library for time series analysis and backtesting at scale
Stars: ✭ 855 (-23.66%)
Mutual labels:  data-analysis, data-visualization
Nvidia libs test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-96.79%)
Mutual labels:  gpu, cuda
Pycuda
CUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (-0.71%)
Mutual labels:  gpu, cuda
Cub
Cooperative primitives for CUDA C++.
Stars: ✭ 883 (-21.16%)
Mutual labels:  gpu, cuda
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (-13.66%)
Mutual labels:  data-analysis, data-visualization
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-22.86%)
Mutual labels:  data-analysis, data-visualization
Qualia2.0
Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.
Stars: ✭ 41 (-96.34%)
Mutual labels:  gpu, cuda
Carlsim3
CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
Stars: ✭ 52 (-95.36%)
Mutual labels:  gpu, cuda
Metrotwitter
What Twitter reveals about the differences between cities and the monoculture of the Bay Area
Stars: ✭ 52 (-95.36%)
Mutual labels:  data-analysis, data-visualization
Running page
Make your own running home page
Stars: ✭ 1,078 (-3.75%)
Mutual labels:  data-analysis, data-visualization
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (-17.23%)
Mutual labels:  gpu, cuda

TSNE-CUDA

Build Status

This repo is an optimized CUDA version of FIt-SNE algorithm with associated python modules. We find that our implementation of t-SNE can be up to 1200x faster than Sklearn, or up to 50x faster than Multicore-TSNE when used with the right GPU. The paper describing our approach, as well as the results below, is available at https://arxiv.org/abs/1807.11824.

You can install binaries with anaconda for CUDA versions 9.0, 9.2, 10.0, and 10.1 using conda install cuda<major><minor> tsnecuda -c cannylab. For more details or to install from source, check out our wiki: https://github.com/CannyLab/tsne-cuda/wiki/

Benchmarks

Simulated Data

Time taken compared to other state of the art algorithms on synthetic datasets with 50 dimensions and four clusters for varying numbers of points. Note the log scale on both the points and time axis, and that the scale of the x-axis is in thousands of points (thus, the values on the x-axis range from 1K to 10M points. Dashed lines on SkLearn, BH-TSNE, and MULTICORE-4 represent projected times. Projected scaling assumes an O(nlog(n)) implementation.

MNIST

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the MNIST dataset. t-SNE-CUDA runs on the raw pixels of the MNIST dataset (60000 images x 768 dimensions) in under 7 seconds.

CIFAR

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the CIFAR-10 dataset. t-SNE-CUDA runs on the output of a classifier on the CIFAR-10 training set (50000 images x 1024 dimensions) in under 6 seconds. While we can run on the full pixel set in under 12 seconds, Euclidean distance is a poor metric in raw pixel space leading to poor quality embeddings.

Comparison of Embedding Quality

The quality of the embeddings produced by t-SNE-CUDA do not differ significantly from the state of the art implementations. See below for a comparison of MNIST cluster outputs.

Left: MULTICORE-4 (501s), Middle: BH-TSNE (1156s), Right: t-SNE-CUDA (Ours, 6.98s).

Installation

To install our library, follow the instructions in the installation section of the wiki.

Run

Like many of the libraries available, the python wrappers subscribe to the same API as sklearn.manifold.TSNE.

You can run it as follows:

from tsnecuda import TSNE
X_embedded = TSNE(n_components=2, perplexity=15, learning_rate=10).fit_transform(X)

We only support n_components=2. We currently have no plans to support more dimensions as this requires significant changes to the code to accomodate.

For more information on running the library, or using it as a C++ library, see the Python usage or C++ Usage sections of the wiki.

Citation

Please cite the corresponding paper if it was useful for your research:

@article{chan2019gpu,
  title={GPU accelerated t-distributed stochastic neighbor embedding},
  author={Chan, David M and Rao, Roshan and Huang, Forrest and Canny, John F},
  journal={Journal of Parallel and Distributed Computing},
  volume={131},
  pages={1--13},
  year={2019},
  publisher={Elsevier}
}

This library is built on top of the following technology, without this tech, none of this would be possible!

L. Van der Maaten's paper

FIt-SNE

Multicore-TSNE

BHTSNE

CUDA Utilities/Pairwise Distance

LONESTAR-GPU

FAISS

GTest

CXXopts

License

Our code is built using components from FAISS, the Lonestar GPU library, GTest, CXXopts, and OrangeOwl's CUDA utilities. Each portion of the code is governed by their respective licenses - however our code is governed by the BSD-3 license found in LICENSE.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].