All Projects → YontiLevin → Embeddings2Image

YontiLevin / Embeddings2Image

Licence: MIT license
create "Karpathy's style" 2d images out of your image embeddings

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Embeddings2Image

ReductionWrappers
R wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)
Stars: ✭ 31 (-40.38%)
Mutual labels:  tsne, umap
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+3.85%)
Mutual labels:  tsne, umap
Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
Stars: ✭ 34 (-34.62%)
Mutual labels:  umap
umap-java
A Uniform Manifold Approximation and Projection (UMAP) library for Java, developed by Tag.bio in collaboration with Real Time Genomics.
Stars: ✭ 16 (-69.23%)
Mutual labels:  umap
tsne-ruby
High performance t-SNE for Ruby
Stars: ✭ 15 (-71.15%)
Mutual labels:  tsne
Amazon-Fine-Food-Review
Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review
Stars: ✭ 28 (-46.15%)
Mutual labels:  tsne
ParametricUMAP paper
Parametric UMAP embeddings for representation and semisupervised learning. From the paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" (Sainburg, McInnes, Gentner, 2020).
Stars: ✭ 132 (+153.85%)
Mutual labels:  umap
UMAP.jl
Uniform Manifold Approximation and Projection (UMAP) implementation in Julia
Stars: ✭ 93 (+78.85%)
Mutual labels:  umap
biovec
ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.
Stars: ✭ 23 (-55.77%)
Mutual labels:  tsne
scGEAToolbox
scGEAToolbox: Matlab toolbox for single-cell gene expression analyses
Stars: ✭ 15 (-71.15%)
Mutual labels:  tsne
AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (-25%)
Mutual labels:  umap
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-13.46%)
Mutual labels:  tsne
Multicore Tsne
Parallel t-SNE implementation with Python and Torch wrappers.
Stars: ✭ 1,664 (+3100%)
Mutual labels:  tsne
BEER
BEER: Batch EffEct Remover for single-cell data
Stars: ✭ 19 (-63.46%)
Mutual labels:  umap
Umap
Uniform Manifold Approximation and Projection
Stars: ✭ 5,268 (+10030.77%)
Mutual labels:  umap
Interactive-3D-Plotting-in-Seurat-3.0.0
This repository contains R code, with which you can create 3D UMAP and tSNE plots of Seurat analyzed scRNAseq data
Stars: ✭ 80 (+53.85%)
Mutual labels:  umap
dbMAP
A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.
Stars: ✭ 39 (-25%)
Mutual labels:  umap
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+13.46%)
Mutual labels:  tsne
Fun-with-MNIST
Playing with MNIST. Machine Learning. Generative Models.
Stars: ✭ 23 (-55.77%)
Mutual labels:  tsne
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (+1.92%)
Mutual labels:  umap

Embeddings2Image

former -> visualize-tsne

This small project is for creating 2d images out of the embeddings of the images.
It was inspired by Andrej Karpathy's blog post on the visualization of CNNs using t-sne.
(this guy is pretty sharp 😉 - you should definitely follow him! ).

UPDATE #1
At first the package only supported dimension reduction using t-sne but now it also support the great umap.
Check it out https://github.com/lmcinnes/umap

UPDATE #2
I saw that the project is useful to some people so I uploaded it to PyPI for easier integration.

UPDATE #3
Checkout the end2end example added by @nivha

Examples

Image of mnist 2d grid via TSNE         Image of mnist scatter via TSNE         Image of mnist scatter via UMAP
mnist TSNE grid example                            mnist TSNE scatter example                            mnist UMAP scatter example


cifar10 grid example          cifar10 scatter example
cifar10 grid image example                                     cifar10 scatter image example

Installation

  1. via pip
    1. pip install Embeddings2Image
  2. Download / Clone
    1. install - python setup.py install
    2. Or just use it as is
      1. pip install -r requirements.txt
      2. see documentation below

Usage

if installed via PyPI

from e2i import EmbeddingsProjector  
 
image = EmbeddingsProjector()
image.path2data = 'data.hdf5'
image.load_data()
image.calculate_projection()
image.create_image()

important! the module expects an hdf5 file with 2 datasets:

  • urls - datasets which contain the path/url of each image
  • vectors - dataset which contains the corresponding vector for each image.
    make sure that they are both ordered alike
  • checkout this hdf5 example

another option is to load the data and urls explicitly:

  • urls - create a np.asarray out of a url list and load to image.image_list
  • vectors - create a np.ndarray of the vectors and load to image.data_vectors

if cloned - you can use it from the cmd

root@yonti:~/github/Embeddings2|Image$ python cmd.py -h
usage: cmd.py [-h] -d PATH2DATA [-n OUTPUT_NAME] [-t OUTPUT_TYPE]
              [-s OUTPUT_SIZE] [-i EACH_IMG_SIZE] [-c BG_COLOR] [--no-shuffle]
              [--no-sklearn] [--no-svd] [-b BATCH_SIZE]

Creating 2d images out of the embeddings ot the images

optional arguments:
  -h, --help            show this help message and exit
  -d PATH2DATA, --path2data PATH2DATA
                        Path to the hdf5 file   
  -n OUTPUT_NAME, --output_name OUTPUT_NAME
                        output image name. Default is tsne_scatter/grid.jpg
  -t OUTPUT_TYPE, --output_type OUTPUT_TYPE
                        the type of the output images (scatter/grid)
  -s OUTPUT_SIZE, --output_size OUTPUT_SIZE
                        output image size (default=2500)
  -i EACH_IMG_SIZE, --img_size EACH_IMG_SIZE
                        each image size (default=50)
  -c BG_COLOR, --background BG_COLOR
                        choose output background color (black/white)
  --no-shuffle          use this flag if you don't want to shuffle
  --method              chose which method to use for projection.
                        umap(default) / sklearn - for sklearn's tsne / matten
                        - for his implementation of tsne
  --no-svd              it is better to reduce the dimension of long dense
                        vectors to a size of 50 or smallerbefore computing the
                        tsne.use this flag if you don't want to do so
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        for speed/memory size errors consider using just a
                        portion of your data (default=all)

root@yonti:~/github/visualize-tsne$ python cmd.py -d /home/data/data.hdf5 -i 50 -s 4000 -n test 

full usage options

# the folowing have both getter and setter
image.path2doc # getter 
image.path2doc = '/home/data/data.hdf5' # setter -> expects string and correct path to an hdf5 file

image.output_img_name  #  getter
image.output_img_name = 'be_creative'  # expects string. default is 'tsne'
                                       # don't add the file type - jpg is set automatically
                                       # also the image type(scatter/grid) is added automatically
image.output_img_type  #  getter
image.output_img_type = 'grid' # expects string. default is 'scatter'. set grid to this way.

image.output_img_size  #  getter
image.output_img_size =  2500  # expects int. default is 2500. 
                               # all images are squared so it means 2500x2500 img.
                               # also the image type(scatter/grid) is added automatically

image.each_img_size    #  getter
image.each_img_size =  50      # expects int. default is 50. 
                               # the output looks better when constructed with squared images
                               # but can also handle rects
                               
image.image_list       #  getter
image.image_list = img_list    # expects numpy array of strings. 
                               # this is filled up automatically when load_data is called.
                               # set this explicitly only if you dont load your data from 
                               # an hdf5 file

image.data_vectors      #  getter
image.data_vectors = data       # expects numpy ndarray of dense vectors. 
                               # this is filled up automatically when load_data is called.
                               # set this explicitly only if you dont load your data from 
                               # an hdf5 file

image.batch_size       #  getter
image.batch_size =  5000       # expects int. default is 0 which means that all images are taken
                               # use this when you have memory issues. 
                               # it will shuffle your data and take only a subset in order to 
                               # compute the tsne. 

image.method       #  getter
image.method =  'maaten'       # expects string. default is 'umap'.
                               # it is both effiecient in time and ,to my naked eye, seperates the clusters better. 
                               # the other options are 'sklearn' and 'maaten'
                               # this sets the tsne method to sklearn.tsne vs python version
                               # of Maaten's tsne.
                               # i guess they both do the same but didn't fully check it 
                               # so i left it as an option

image.background_color         #  getter
image.background_color =  'white'  # expects string. default is 'black'. the other option is 'white'
                                        
image.tsne_vectors      #  getter
image.tsne_vectors = data       # expects numpy ndarray of dense 2d vectors. 
                               # this is filled up automatically when 
                               # image.calaculate_tsne is called.
                               # set this explicitly only if you have already the tsne vectors

# the followings are methods
image.load_data()  #  opens the file which path2file point to
                   #  fills image.data_vectors and image.image_list  
                   
image.calculate_tsne()  #  straight forward

image.create_image()  #  straight forward
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].