This repo contains regression and classification projects. Examples: development of predictive models for comments on social media websites; building classifiers to predict outcomes in sports competitions; churn analysis; prediction of clicks on online ads; analysis of the opioids crisis and an analysis of retail store expansion strategies using…

Stars: ✭ 34 (+54.55%)

Mutual labels: decision-trees

IBM-final-project-Machine-Learning

Final project of IBM's course https://www.coursera.org/learn/machine-learning-with-python on coursera

Stars: ✭ 33 (+50%)

Mutual labels: decision-trees

badsv

The official BaDSV specification and CLI. Ascend from your puny DSV files!

Stars: ✭ 50 (+127.27%)

Mutual labels: encoder-decoder

yggdrasil-decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Stars: ✭ 156 (+609.09%)

Mutual labels: decision-trees

Machine Learning From Scratch

Machine Learning models from scratch with a better visualisation

Stars: ✭ 15 (-31.82%)

Mutual labels: decision-trees

ConvLSTM-PyTorch

ConvLSTM/ConvGRU (Encoder-Decoder) with PyTorch on Moving-MNIST

Stars: ✭ 202 (+818.18%)

Mutual labels: encoder-decoder

autoencoders tensorflow

Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.

Stars: ✭ 66 (+200%)

Mutual labels: autoencoder

colornet-template

Colorizing B&W Photos with Neural Networks

Stars: ✭ 31 (+40.91%)

Mutual labels: autoencoder

recurrent-decoding-cell

[AAAI'20] Segmenting Medical MRI via Recurrent Decoding Cell (Spotlight)

Stars: ✭ 14 (-36.36%)

Mutual labels: encoder-decoder

View All Similar Projects ➔

Encoder Forest (eForest)

Non-official implementation of the paper "AutoEncoder by Forest" by Ji Feng and Zhi-Hua Zhou 2017. https://arxiv.org/abs/1709.09018

Contrary to what the name suggests, eForest is not an ensembling method but an algorithm for encoding high-dimensional vectors to vectors of node identifiers and decoding them back to the original data. Given a feature vector of size n, n completely random trees are created. Each tree is then associated to a single component of the latent vector. Indeed, a tree is basically a function mapping input vectors to leafs. As a result, latent variable i is the leaf identifier obtained by feeding tree i with the input vector.

A decision tree basically subdivides the input space into hyper-parallelepipeds: each decision leaf corresponds to a unique hyper-parallelepiped. The rules used to obtain the latter can easily be retrieved by going from the leaf to the root and saving all the split functions located on the path to the root. Because the n embedding trees have received the same input vector to produce the latent variables, we know that this original vector is located in the intersection of the n hyper-parallelepipeds. Such an intersection is also a hyper-parallelepiped, and one can easily obtain a decoded vector by randomly sampling this subspace. The combination of node rules that describes this subspace is called the Maximal-Compatible Rule (MCR).

Suggestion for improvements: instead of randomly growing the embedding trees, construct them in such a way that the volume of the intersection is minimized for every combination of n leafs (one leaf per tree).

Original images and their reconstructed versions using an unsupervised eForest (500 shallow trees of depth 10)

How to use it

from encoder import EncoderForest

encoder = EncoderForest(5)
encoder.fit(X_train, max_depth=20) # Fit embedding trees (unsupervised eForest)

encoded = encoder.encode(X_train) # Encode the entire training set
decoded = encoder.decode(encoded[0]) # Decode first instance of the training set

# Intuitively, the performance of an embedding tree could be measured
# as the log(volume) returned by this tree divided by the log(volume) of the MCR.

# Get the path rules used to decode the first instance
rule_list = encoder.compute_rule_list(encoded[0])

# For each path rule (hyper-parallelepiped), compute its log(volume)
for i, path_rule in enumerate(rule_list):
    log_volume = path_rule.compute_volume()
    print("Log-volume of hyper-parallelepiped %i (tree %i): %f" % (i, i, log_volume))

# Get the intersection of the subspaces described by the path rules
MCR = self.calculate_MCR(rule_list)

# Compute the log(volume) of the subspace described by the MCR
print("MCR log-volume: %f" % MCR.compute_volume())

# Decode by generating a random sample in the final subspace
decoded = MCR.sample()

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

AntoinePassemiers / Encoder-Forest

Programming Languages

Labels

Projects that are alternatives of or similar to Encoder-Forest

Encoder Forest (eForest)

How to use it