Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gregversteeg → Corex

gregversteeg / Corex

Licence: gpl-2.0

CorEx or "Correlation Explanation" discovers a hierarchy of informative latent factors. This reference implementation has been superseded by other versions below.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning unsupervised-learning

Projects that are alternatives of or similar to Corex

Deep-learning approaches to object recognition from 3D data

Stars: ✭ 19 (-92.86%)

Mutual labels: unsupervised-learning

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (-87.59%)

Mutual labels: unsupervised-learning

learning-topology-synthetic-data

Tensorflow implementation of Learning Topology from Synthetic Data for Unsupervised Depth Completion (RAL 2021 & ICRA 2021)

Stars: ✭ 22 (-91.73%)

Mutual labels: unsupervised-learning

(ICCV 2021) Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

Stars: ✭ 33 (-87.59%)

Mutual labels: unsupervised-learning

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

Stars: ✭ 102 (-61.65%)

Mutual labels: unsupervised-learning

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Stars: ✭ 39 (-85.34%)

Mutual labels: unsupervised-learning

Unsupervised Controllable Text Generation (Applied to text Formalization)

Stars: ✭ 19 (-92.86%)

Mutual labels: unsupervised-learning

Self-Supervised-depth

Self-Supervised depth kalilia

Stars: ✭ 20 (-92.48%)

Mutual labels: unsupervised-learning

VAE with RealNVP prior and Super-Resolution VAE in PyTorch. Code release for https://arxiv.org/abs/2006.05218.

Stars: ✭ 56 (-78.95%)

Mutual labels: unsupervised-learning

(NeurIPS 2020 oral) Code for "Deep Transformation-Invariant Clustering" paper

Stars: ✭ 60 (-77.44%)

Mutual labels: unsupervised-learning

Similarity-Adaptive-Deep-Hashing

Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization (TPAMI2018)

Stars: ✭ 18 (-93.23%)

Mutual labels: unsupervised-learning

Parametric Instance Classification for Unsupervised Visual Feature Learning, NeurIPS 2020

Stars: ✭ 41 (-84.59%)

Mutual labels: unsupervised-learning

Machine Learning (EE 5184) in NTU

Stars: ✭ 66 (-75.19%)

Mutual labels: unsupervised-learning

Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

Stars: ✭ 34 (-87.22%)

Mutual labels: unsupervised-learning

adareg-monodispnet

Repository for Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction (CVPR2019)

Stars: ✭ 22 (-91.73%)

Mutual labels: unsupervised-learning

ladder-vae-pytorch

Ladder Variational Autoencoders (LVAE) in PyTorch

Stars: ✭ 59 (-77.82%)

Mutual labels: unsupervised-learning

TCyb 2018: Graph learning for multiview clustering

Stars: ✭ 26 (-90.23%)

Mutual labels: unsupervised-learning

Learning to Cluster. A deep clustering strategy.

Stars: ✭ 262 (-1.5%)

Mutual labels: unsupervised-learning

[TIP2020] Pytorch implementation of "Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network"

Stars: ✭ 68 (-74.44%)

Mutual labels: unsupervised-learning

Assessing Source Code Semantic Similarity with Unsupervised Learning

Stars: ✭ 42 (-84.21%)

Mutual labels: unsupervised-learning

View All Similar Projects ➔

Correlation Explanation (CorEx)

The principle of Cor-relation Ex-planation has recently been introduced as a way to build rich representations that are informative about relationships in data. This project consists of python code to build these representations.

The version here implements only the technique of the 2014 NIPS paper. A version that incorporates features like continuous variables, missing values, and Bayesian smoothing is now available here: https://github.com/gregversteeg/bio_corex/. It subsumes all functionality in this version. Despite the name, there's nothing specific to biology about it, but development was guided by problems in biomedical domains.

A preliminary version of the technique is described in this paper.
Discovering Structure in High-Dimensional Data Through Correlation Explanation
Greg Ver Steeg and Aram Galstyan, NIPS 2014, http://arxiv.org/abs/1406.1222

Some theoretical developments are described here:
Maximally Informative Hierarchical Representions of High-Dimensional Data
Greg Ver Steeg and Aram Galstyan, AISTATS 2015, http://arxiv.org/abs/1410.7404

The code here is written by Greg Ver Steeg and Gabriel Pereyra.

Dependencies

CorEx requires numpy and scipy. If you use OS X, I recommend installing the Scipy Superpack:
http://fonnesbeck.github.io/ScipySuperpack/

Install

To install, download using the link on the right or clone the project by executing this command in your target directory:

git clone https://github.com/gregversteeg/CorEx.git

Use git pull to get updates. The code is under development. Please feel free to raise issues or request features using the github interface.

Basic Usage

Example

import corex as ce

X = np.array([[0,0,0,0,0], # A matrix with rows as samples and columns as variables.
              [0,0,0,1,1],
              [1,1,1,0,0],
              [1,1,1,1,1]], dtype=int)

layer1 = ce.Corex(n_hidden=2)  # Define the number of hidden factors to use.
layer1.fit(X)

layer1.clusters  # Each variable/column is associated with one Y_j
# array([0, 0, 0, 1, 1])
layer1.labels[0]  # Labels for each sample for Y_0
# array([0, 0, 1, 1])
layer1.labels[1]  # Labels for each sample for Y_1
# array([0, 1, 0, 1])
layer1.tcs  # TC(X;Y_j) (all info measures reported in nats). 
# array([ 1.385,  0.692])
# TC(X_Gj) >=TC(X_Gj ; Y_j)
# For this example, TC(X1,X2,X3)=1.386, TC(X4,X5) = 0.693

Data format

For the basic version of CorEx, you must input a matrix of integers whose rows represent samples and whose columns represent different variables. The values must be integers {0,1,...,k-1} where k represents the maximum number of values that each variable, x_i can take. By default, entries equal to -1 are treated as missing. This can be altered by passing a missing_values argument when initializing CorEx.

CorEx outputs

As shown in the example, clusters gives the variable clusters for each hidden factor Y_j and labels gives the labels for each sample for each Y_j. Probabilistic labels can be accessed with p_y_given_x.

The total correlation explained by each hidden factor, TC(X;Y_j), is accessed with tcs. Outputs are sorted so that Y_0 is always the component that explains the highest TC. Like point-wise mutual information, you can define point-wise total correlation measure for an individual sample, x^l
TC(X = x^l;Y_j) == log Z_j(x)
This quantity is accessed with log_z. This represents the correlations explained by Y_j for an individual sample. A low (or even negative!) number can be obtained. This can be interpreted as a measure of how surprising an individual observation is. This can be useful for anomaly detection.

Generalizations

Hierarchical CorEx

The simplest extension is to stack CorEx representations on top of each other.

layer1 = ce.Corex(n_hidden=100)
layer2 = ce.Corex(n_hidden=10)
layer3 = ce.Corex(n_hidden=1)
Y1 = layer1.fit_transform(X)
Y2 = layer2.fit_transform(Y1)
Y3 = layer2.fit_transform(Y2)

The sum of total correlations explained by each layer provides a successively tighter lower bound on TC(X). This will be detailed in a paper in progress. To assess how large your representations should be, look at quantities like layer.tcs. Do all the Y_j's explain some correlation (i.e., all the TCs are significantly larger than 0)? If not you should probably use a smaller representation.

Missing values

You can set missing values (by specifying missing_values=-1, when calling, e.g.). CorEx is very robust to missing data. This hasn't been extensively tested yet so be careful with this feature. (E.g., while the distribution of missing values should not matter in principle, it does have an effect in this version.)

Future versions

We are currently testing extensions that allow for arbitrary data types such as continuous variables. Some of these capabilities were added to https://github.com/gregversteeg/bio_corex/.

Visualization

See http://bit.ly/corexvis for examples of some of the rich visualization capabilities. These types of visualizations are added in https://github.com/gregversteeg/bio_corex/.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 266

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗