All Projects → berenslab → ne-spectrum

berenslab / ne-spectrum

Licence: other
A Unifying Perspective on Neighbor Embeddings along the Attraction-Repulsion Spectrum

Programming Languages

python
139335 projects - #7 most used programming language
cython
566 projects

Labels

Projects that are alternatives of or similar to ne-spectrum

pix-payload-generator.net
Gerar payload para qrcode estático PIX. (Sistema de pagamento instantâneo do Brasil) Sem a necessidade de conexão com um PSP.
Stars: ✭ 23 (+35.29%)
Mutual labels:  code
open-gsa-redesign
A fresh start for open.gsa.gov.
Stars: ✭ 27 (+58.82%)
Mutual labels:  code
cosy
阿里云智能编码插件(Alibaba Cloud AI Coding Assistant)是一款AI编程助手,它提供代码智能补全和IDE内的代码示例搜索能力,帮助你更快更高效地写出高质量代码。
Stars: ✭ 211 (+1141.18%)
Mutual labels:  code
parse-cloud-class
Extendable way to set up Parse Cloud classes behaviour
Stars: ✭ 40 (+135.29%)
Mutual labels:  code
opendev
OpenDev is a non-profit project that tries to collect as many resources (assets) of free use for the development of video games and applications.
Stars: ✭ 34 (+100%)
Mutual labels:  code
find-sec-bugs-demos
Repository to showcase various configuration recipes with various technologies
Stars: ✭ 33 (+94.12%)
Mutual labels:  code
Hacktoberfest-2021
An Open Source repository to Teach people How to contribute to open sources.
Stars: ✭ 98 (+476.47%)
Mutual labels:  code
coding-untuk-semua
Coding untuk semua, kumpulan materi-materi untuk belajar coding/pemrograman.
Stars: ✭ 18 (+5.88%)
Mutual labels:  code
VerificationCode
简单的滑动验证码JS插件 图片验证码
Stars: ✭ 15 (-11.76%)
Mutual labels:  code
Domainker
BugBounty Tool
Stars: ✭ 40 (+135.29%)
Mutual labels:  code
Discord-Nitro-BruteForce
simple discord nitro code generator and checker written in c#
Stars: ✭ 26 (+52.94%)
Mutual labels:  code
go-captcha
Go Captcha is a behavioral captcha, which implements the generation of random verification text and the verification of click position information.
Stars: ✭ 86 (+405.88%)
Mutual labels:  code
code-examples
Short code snippets written by our open source community!
Stars: ✭ 60 (+252.94%)
Mutual labels:  code
Bijou.js
Bijou.js: Useful JavaScript snippets in one simple library
Stars: ✭ 30 (+76.47%)
Mutual labels:  code
XS-Labs-Style-Guide
XS-Labs Coding Style Guide for C, C++, Objective-C and x86 Assembly
Stars: ✭ 20 (+17.65%)
Mutual labels:  code
SwiftyCodeView
Fully customizable UI Component for verification codes written in swift with RxSwift support!
Stars: ✭ 86 (+405.88%)
Mutual labels:  code
windows-nt-vscode-theme
A Windows NT/2000 theme for VS Code 🎉
Stars: ✭ 63 (+270.59%)
Mutual labels:  code
Parsia-Code
Contains random code and some of my older projects
Stars: ✭ 20 (+17.65%)
Mutual labels:  code
gsql
GSQL is a structured query language code builder for golang.
Stars: ✭ 106 (+523.53%)
Mutual labels:  code
cpplint-extension
vscode cpplint extension
Stars: ✭ 17 (+0%)
Mutual labels:  code

This repository holds the code for https://arxiv.org/abs/2007.08902: A Unifying Perspective on Neighbor Embeddings along the Attraction-Repulsion Spectrum.

Structure/Installation

After all instructions in this section have been completed, the code can be installed via

git clone https://github.com/berenslab/ne-spectrum
cd ne-spectrum
pip install --user -r requirements.txt
python setup.py build
mv bh*.so jnb_msc/transformer/
pip install --user -e .

The above command will probably fail to compile the cython extensions. For that you need to install/compile openTSNE manually (clone the repo and install it similarly as above). This project has a build time dependency on a build time artifact (the file quad_tree.pxd) that is not installed along openTSNE by default.

After installing openTSNE this way you have to adapt the two lines in setup.py that point to the locally installed openTSNE folder, so that during the build process the missing file can be found.

Furthermore, you need a patched version of forceatlas2 from https://github.com/jnboehm/forceatlas2, where degree repulsion has been added to fa2. Install it as follows

git clone https://github.com/jnboehm/forceatlas2
cd forceatlas2
rm fa2/fa2util.c
python setup.py build
pip install --user -e .

There is also a requirements.txt file to install the dependencies. The code has been run in a conda environment with python 3.8.

The preprocessing script for the treutlein dataset resides in static/.

Running the code

To create a figure, you can simply redo one of the files in media/. For example, after installing redo, you can write redo -j6 media/ar-spectrum.pdf. This will make sure that the data is present and up-to-date and generate the figure. The instructions are written in the file media/ar-spectrum.pdf.do. This calls out to redo again ([[file:media/ar-spectrum.pdf.do::redo.redo_ifchange(datafiles + \[plotter.labelname, plotter.rc\])][l. 268, in =media/ar-spectrum.pdf.do=]]), which will recurse until all dependencies have been satisfied and afterwards create the figure. The file itself is written in python, although the do file itself is language agnostic and can be set by the shebang (#!) in the first line of the file.

To see which parameters have been set one can investigate which filenames are generated by the script (look at what is supplied to jnb_msc.redo.redo_ifchange(...)). This shows what parameters are deviating from the defaults set in the class definition.

Code structure

The classes in the project are all derived from a single base class. It forsees that every subclass implements four methods:

  1. get_datadeps()
  2. load()
  3. transform()
  4. save()

The first function allows to query the object what files it needs, this is used by redo in order to track the dependencies properly. The other remaining functions should be more or less self explanatory. It is of course also possible to use the algorithms manually. For that the .data field needs to be populated with suitable data and possibly the field .init, depending on the algorithm at hand.

There are four major different types:

  1. GenStage
  2. NDStage
  3. NNstage
  4. SimStage

GenStage is the root class for the classes that will generate a dataset. This can be simulated data or simply taking a dataset and putting it in the correct place (again, for redo and this project structure). NDStage will take in an NxD matrix and reduce its dimensionality to a lower one; one example for this would be PCA. NNStage can take the same input as NDStage (but usually takes the output of e. g. PCA) and will turn this into an NxN affinity/adjaceny matrix. This can then, in turn, be fed into the last one, SimStage. These types of classes take in both an NxN matrix and an NxD (D=2) array, that will serve as the initial layout.

There are further minor classes, for examle simple classes that will rescale the input to have a predefined std or maximum scale (code in jnb_msc/transformer/scale.py).

If anything is unclear, please let me know.

What are all those .do files?

This repository uses redo to essentially “cache” the computations that are carried out by the experiments. It works similar to `make` in that it tries to guess what files have been changed and what parts needs to be rebuilt. I chose this approach so that I wouldn’t have to either recompute everything every time or manually change the code to either load a (possibly stale) file or recompute it and save it.

For more information, the (rough) notes on the original design are here.

Unfortunately, the implementation I am using is written in python2 and hence needs to be installed separately. It is not strictly necessary to install this library, but all the code to generate the figures uses this to check the presence (and staleness) of the files. Furthermore, the load() and save() functions are written with redo in mind.

For example, to get an image of t-SNE on MNIST, one could write in the root of the repository:

redo 'data/mnist/pca/affinity/stdscale;f:1e-4/tsne/data.png'

This will “generate” the dataset MNIST, then reduce it with PCA to 50 dimensions, the default here. Afterwards it will calculate the pairwise affinities from that. Then the std will be set to the value given and finally tsne will be run with the scaled dense NxD matrix and the NxN matrix for its affinities. After the optimization, the embedding (named data.npy) will be used to create a scatter plot, which will in turn be saved as data.png. This file can then be viewed.

The prefix data/ is not mandatory. It can be omitted or it can be structured in any way. The “effect” of the other folder names is shown in jnb_msc/util.py. The names are resolved to classes. Further arguments, in colon-separated pairs, can be separated with a semicolon, for example stdscale will be called with f=1e-4.

prepped/

The folder prepped/ is used to dump all the produced files by the algorithms. This has two reasons. Firstly, it prevents clutter in the main directories. Secondly, this way the files can actually be tracked via redo since it does not support multiple output files from one run. For more information on that, see also the documentation (the heading “Virtual targets, side effects, and multiple outputs”).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].