All Projects → DmitryUlyanov → Multicore Tsne

DmitryUlyanov / Multicore Tsne

Licence: bsd-3-clause
Parallel t-SNE implementation with Python and Torch wrappers.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
lua
6591 projects
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to Multicore Tsne

simplx
C++ development framework for building reliable cache-friendly distributed and concurrent multicore software
Stars: ✭ 61 (-96.33%)
Mutual labels:  multicore
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (-96.45%)
Mutual labels:  tsne
Ocaml Multicore
Multicore OCaml
Stars: ✭ 591 (-64.48%)
Mutual labels:  multicore
scGEAToolbox
scGEAToolbox: Matlab toolbox for single-cell gene expression analyses
Stars: ✭ 15 (-99.1%)
Mutual labels:  tsne
khiva-python
Python binding for Khiva library.
Stars: ✭ 44 (-97.36%)
Mutual labels:  multicore
Amazon-Fine-Food-Review
Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review
Stars: ✭ 28 (-98.32%)
Mutual labels:  tsne
rpmsg-lite
RPMsg implementation for small MCUs
Stars: ✭ 157 (-90.56%)
Mutual labels:  multicore
Saber
Window-Based Hybrid CPU/GPU Stream Processing Engine
Stars: ✭ 35 (-97.9%)
Mutual labels:  multicore
biovec
ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.
Stars: ✭ 23 (-98.62%)
Mutual labels:  tsne
Helenos
A portable microkernel-based multiserver operating system written from scratch.
Stars: ✭ 553 (-66.77%)
Mutual labels:  multicore
tsne-ruby
High performance t-SNE for Ruby
Stars: ✭ 15 (-99.1%)
Mutual labels:  tsne
ReductionWrappers
R wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)
Stars: ✭ 31 (-98.14%)
Mutual labels:  tsne
Erpc
Embedded RPC
Stars: ✭ 262 (-84.25%)
Mutual labels:  multicore
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-97.3%)
Mutual labels:  tsne
Sparsex
The SparseX sparse kernel optimization library
Stars: ✭ 23 (-98.62%)
Mutual labels:  multicore
Fun-with-MNIST
Playing with MNIST. Machine Learning. Generative Models.
Stars: ✭ 23 (-98.62%)
Mutual labels:  tsne
Embeddings2Image
create "Karpathy's style" 2d images out of your image embeddings
Stars: ✭ 52 (-96.87%)
Mutual labels:  tsne
Trck
Query engine for TrailDB
Stars: ✭ 48 (-97.12%)
Mutual labels:  multicore
Python Parallel Programming Cookbook Cn
📖《Python Parallel Programming Cookbook》中文版
Stars: ✭ 978 (-41.23%)
Mutual labels:  multicore
Entitycomponentsystemsamples
No description or website provided.
Stars: ✭ 4,218 (+153.49%)
Mutual labels:  multicore

Multicore t-SNE Build Status

This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also works faster than sklearn.TSNE on 1 core.

What to expect

Barnes-Hut t-SNE is done in two steps.

  • First step: an efficient data structure for nearest neighbours search is built and used to compute probabilities. This can be done in parallel for each point in the dataset, this is why we can expect a good speed-up by using more cores.

  • Second step: the embedding is optimized using gradient descent. This part is essentially consecutive so we can only optimize within iteration. In fact some parts can be parallelized effectively, but not all of them a parallelized for now. That is why second step speed-up will not be that significant as first step sepeed-up but there is still room for improvement.

So when can you benefit from parallelization? It is almost true, that the second step computation time is constant of D and depends mostly on N. The first part's time depends on D a lot, so for small D time(Step 1) << time(Step 2), for large D time(Step 1) >> time(Step 2). As we are only good at parallelizing step 1 we will benefit most when D is large enough (MNIST's D = 784 is large, D = 10 even for N=1000000 is not so much). I wrote multicore modification originally for Springleaf competition, where my data table was about 300000 x 3000 and only several days left till the end of the competition so any speed-up was handy.

Benchmark

1 core

Interestingly, that this code beats other implementations. We compare to sklearn (Barnes-Hut of course), L. Van der Maaten's bhtsne, py_bh_tsne repo (cython wrapper for bhtsne with QuadTree). perplexity = 30, theta=0.5 for every run. In fact py_bh_tsne repo works at the same speed as this code when using more optimization flags for compiler.

This is a benchmark for 70000x784 MNIST data:

Method Step 1 (sec) Step 2 (sec)
MulticoreTSNE(n_jobs=1) 912 350
bhtsne 4257 1233
py_bh_tsne 1232 367
sklearn(0.18) ~5400 ~20920

I did my best to find what is wrong with sklearn numbers, but it is the best benchmark I could do (you can find test script in python/tests folder).

Multicore

This table shows a relative to 1 core speed-up when using n cores.

n_jobs Step 1 Step 2
1 1x 1x
2 1.54x 1.05x
4 2.6x 1.2x
8 5.6x 1.65x

How to use

Python and torch wrappers are available.

Python

Install

Directly from pypi

pip install MulticoreTSNE

From source

Make sure cmake is installed on your system, and you will also need a sensible C++ compiler, such as gcc or llvm-clang. On macOS, you can get both via homebrew.

To install the package, please do:

git clone https://github.com/DmitryUlyanov/Multicore-TSNE.git
cd Multicore-TSNE/
pip install .

Tested with both Python 2.7 and 3.6 (conda) and Ubuntu 14.04.

Run

You can use it as a near drop-in replacement for sklearn.manifold.TSNE.

from MulticoreTSNE import MulticoreTSNE as TSNE

tsne = TSNE(n_jobs=4)
Y = tsne.fit_transform(X)

Please refer to sklearn TSNE manual for parameters explanation.

This implementation n_components=2, which is the most common case (use Barnes-Hut t-SNE or sklearn otherwise). Also note that some parameters are there just for the sake of compatibility with sklearn and are otherwise ignored. See MulticoreTSNE class docstring for more info.

MNIST example

from sklearn.datasets import load_digits
from MulticoreTSNE import MulticoreTSNE as TSNE
from matplotlib import pyplot as plt

digits = load_digits()
embeddings = TSNE(n_jobs=4).fit_transform(digits.data)
vis_x = embeddings[:, 0]
vis_y = embeddings[:, 1]
plt.scatter(vis_x, vis_y, c=digits.target, cmap=plt.cm.get_cmap("jet", 10), marker='.')
plt.colorbar(ticks=range(10))
plt.clim(-0.5, 9.5)
plt.show()

Test

You can test it on MNIST dataset with the following command:

python MulticoreTSNE/examples/test.py <n_jobs>

Note on jupyter use

To make the computation log visible in jupyter please install wurlitzer (pip install wurlitzer) and execute this line in any cell beforehand:

%load_ext wurlitzer

Memory leakages are possible if you interrupt the process. Should be OK if you let it run until the end.

Torch

To install execute the following command from repository folder:

luarocks make torch/tsne-1.0-0.rockspec

or

luarocks install https://raw.githubusercontent.com/DmitryUlyanov/Multicore-TSNE/master/torch/tsne-1.0-0.rockspec

You can run t-SNE like that:

tsne = require 'tsne'

Y = tsne(X, n_components, perplexity, n_iter, angle, n_jobs)

torch.DoubleTensor type only supported for now.

License

Inherited from original repo's license.

Future work

  • Allow other types than double
  • Improve step 2 performance (possible)

Citation

Please cite this repository if it was useful for your research:

@misc{Ulyanov2016,
  author = {Ulyanov, Dmitry},
  title = {Multicore-TSNE},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DmitryUlyanov/Multicore-TSNE}},
}

Of course, do not forget to cite L. Van der Maaten's paper

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].