All Projects → gregversteeg → Corex

gregversteeg / Corex

Licence: gpl-2.0
CorEx or "Correlation Explanation" discovers a hierarchy of informative latent factors. This reference implementation has been superseded by other versions below.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Corex

deep learning
Deep-learning approaches to object recognition from 3D data
Stars: ✭ 19 (-92.86%)
Mutual labels:  unsupervised-learning
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-87.59%)
Mutual labels:  unsupervised-learning
learning-topology-synthetic-data
Tensorflow implementation of Learning Topology from Synthetic Data for Unsupervised Depth Completion (RAL 2021 & ICRA 2021)
Stars: ✭ 22 (-91.73%)
Mutual labels:  unsupervised-learning
dti-sprites
(ICCV 2021) Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper
Stars: ✭ 33 (-87.59%)
Mutual labels:  unsupervised-learning
PiCIE
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)
Stars: ✭ 102 (-61.65%)
Mutual labels:  unsupervised-learning
back2future
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions
Stars: ✭ 39 (-85.34%)
Mutual labels:  unsupervised-learning
uctf
Unsupervised Controllable Text Generation (Applied to text Formalization)
Stars: ✭ 19 (-92.86%)
Mutual labels:  unsupervised-learning
Self-Supervised-depth
Self-Supervised depth kalilia
Stars: ✭ 20 (-92.48%)
Mutual labels:  unsupervised-learning
srVAE
VAE with RealNVP prior and Super-Resolution VAE in PyTorch. Code release for https://arxiv.org/abs/2006.05218.
Stars: ✭ 56 (-78.95%)
Mutual labels:  unsupervised-learning
dti-clustering
(NeurIPS 2020 oral) Code for "Deep Transformation-Invariant Clustering" paper
Stars: ✭ 60 (-77.44%)
Mutual labels:  unsupervised-learning
Similarity-Adaptive-Deep-Hashing
Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization (TPAMI2018)
Stars: ✭ 18 (-93.23%)
Mutual labels:  unsupervised-learning
PIC
Parametric Instance Classification for Unsupervised Visual Feature Learning, NeurIPS 2020
Stars: ✭ 41 (-84.59%)
Mutual labels:  unsupervised-learning
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (-75.19%)
Mutual labels:  unsupervised-learning
Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
Stars: ✭ 34 (-87.22%)
Mutual labels:  unsupervised-learning
adareg-monodispnet
Repository for Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction (CVPR2019)
Stars: ✭ 22 (-91.73%)
Mutual labels:  unsupervised-learning
ladder-vae-pytorch
Ladder Variational Autoencoders (LVAE) in PyTorch
Stars: ✭ 59 (-77.82%)
Mutual labels:  unsupervised-learning
MVGL
TCyb 2018: Graph learning for multiview clustering
Stars: ✭ 26 (-90.23%)
Mutual labels:  unsupervised-learning
L2c
Learning to Cluster. A deep clustering strategy.
Stars: ✭ 262 (-1.5%)
Mutual labels:  unsupervised-learning
UEGAN
[TIP2020] Pytorch implementation of "Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network"
Stars: ✭ 68 (-74.44%)
Mutual labels:  unsupervised-learning
altair
Assessing Source Code Semantic Similarity with Unsupervised Learning
Stars: ✭ 42 (-84.21%)
Mutual labels:  unsupervised-learning

Correlation Explanation (CorEx)

The principle of Cor-relation Ex-planation has recently been introduced as a way to build rich representations that are informative about relationships in data. This project consists of python code to build these representations.

The version here implements only the technique of the 2014 NIPS paper. A version that incorporates features like continuous variables, missing values, and Bayesian smoothing is now available here: https://github.com/gregversteeg/bio_corex/. It subsumes all functionality in this version. Despite the name, there's nothing specific to biology about it, but development was guided by problems in biomedical domains.

A preliminary version of the technique is described in this paper.
Discovering Structure in High-Dimensional Data Through Correlation Explanation
Greg Ver Steeg and Aram Galstyan, NIPS 2014, http://arxiv.org/abs/1406.1222

Some theoretical developments are described here:
Maximally Informative Hierarchical Representions of High-Dimensional Data
Greg Ver Steeg and Aram Galstyan, AISTATS 2015, http://arxiv.org/abs/1410.7404

The code here is written by Greg Ver Steeg and Gabriel Pereyra.

Dependencies

CorEx requires numpy and scipy. If you use OS X, I recommend installing the Scipy Superpack:
http://fonnesbeck.github.io/ScipySuperpack/

Install

To install, download using the link on the right or clone the project by executing this command in your target directory:

git clone https://github.com/gregversteeg/CorEx.git

Use git pull to get updates. The code is under development. Please feel free to raise issues or request features using the github interface.

Basic Usage

Example

import corex as ce

X = np.array([[0,0,0,0,0], # A matrix with rows as samples and columns as variables.
              [0,0,0,1,1],
              [1,1,1,0,0],
              [1,1,1,1,1]], dtype=int)

layer1 = ce.Corex(n_hidden=2)  # Define the number of hidden factors to use.
layer1.fit(X)

layer1.clusters  # Each variable/column is associated with one Y_j
# array([0, 0, 0, 1, 1])
layer1.labels[0]  # Labels for each sample for Y_0
# array([0, 0, 1, 1])
layer1.labels[1]  # Labels for each sample for Y_1
# array([0, 1, 0, 1])
layer1.tcs  # TC(X;Y_j) (all info measures reported in nats). 
# array([ 1.385,  0.692])
# TC(X_Gj) >=TC(X_Gj ; Y_j)
# For this example, TC(X1,X2,X3)=1.386, TC(X4,X5) = 0.693

Data format

For the basic version of CorEx, you must input a matrix of integers whose rows represent samples and whose columns represent different variables. The values must be integers {0,1,...,k-1} where k represents the maximum number of values that each variable, x_i can take. By default, entries equal to -1 are treated as missing. This can be altered by passing a missing_values argument when initializing CorEx.

CorEx outputs

As shown in the example, clusters gives the variable clusters for each hidden factor Y_j and labels gives the labels for each sample for each Y_j. Probabilistic labels can be accessed with p_y_given_x.

The total correlation explained by each hidden factor, TC(X;Y_j), is accessed with tcs. Outputs are sorted so that Y_0 is always the component that explains the highest TC. Like point-wise mutual information, you can define point-wise total correlation measure for an individual sample, x^l
TC(X = x^l;Y_j) == log Z_j(x)
This quantity is accessed with log_z. This represents the correlations explained by Y_j for an individual sample. A low (or even negative!) number can be obtained. This can be interpreted as a measure of how surprising an individual observation is. This can be useful for anomaly detection.

Generalizations

Hierarchical CorEx

The simplest extension is to stack CorEx representations on top of each other.

layer1 = ce.Corex(n_hidden=100)
layer2 = ce.Corex(n_hidden=10)
layer3 = ce.Corex(n_hidden=1)
Y1 = layer1.fit_transform(X)
Y2 = layer2.fit_transform(Y1)
Y3 = layer2.fit_transform(Y2)

The sum of total correlations explained by each layer provides a successively tighter lower bound on TC(X). This will be detailed in a paper in progress. To assess how large your representations should be, look at quantities like layer.tcs. Do all the Y_j's explain some correlation (i.e., all the TCs are significantly larger than 0)? If not you should probably use a smaller representation.

Missing values

You can set missing values (by specifying missing_values=-1, when calling, e.g.). CorEx is very robust to missing data. This hasn't been extensively tested yet so be careful with this feature. (E.g., while the distribution of missing values should not matter in principle, it does have an effect in this version.)

Future versions

We are currently testing extensions that allow for arbitrary data types such as continuous variables. Some of these capabilities were added to https://github.com/gregversteeg/bio_corex/.

Visualization

See http://bit.ly/corexvis for examples of some of the rich visualization capabilities. These types of visualizations are added in https://github.com/gregversteeg/bio_corex/.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].