naru-project / naru

Licence: Apache-2.0 license

Neural Relation Understanding: neural cardinality estimators for tabular data

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to naru

deepdb-public

Implementation of DeepDB: Learn from Data, not from Queries!

Stars: ✭ 61 (-19.74%)

Mutual labels: learned-database, cardinality-estimation, learned-database-components

Gumbel-CRF

Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs

Stars: ✭ 51 (-32.89%)

Mutual labels: generative-model, density-estimation, deep-generative-model

temporal-ssl

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Stars: ✭ 46 (-39.47%)

Mutual labels: unsupervised-learning, self-supervised-learning

latent-pose-reenactment

The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper.

Stars: ✭ 132 (+73.68%)

Mutual labels: generative-model, self-supervised-learning

Awesome-Vision-Transformer-Collection

Variants of Vision Transformer and its downstream tasks

Stars: ✭ 124 (+63.16%)

Mutual labels: generative-model, self-supervised-learning

Transferlearning

Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习

Stars: ✭ 8,481 (+11059.21%)

Mutual labels: unsupervised-learning, self-supervised-learning

Sfmlearner

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

Stars: ✭ 1,661 (+2085.53%)

Mutual labels: unsupervised-learning, self-supervised-learning

PIC

Parametric Instance Classification for Unsupervised Visual Feature Learning, NeurIPS 2020

Stars: ✭ 41 (-46.05%)

Mutual labels: unsupervised-learning, self-supervised-learning

Awesome Vaes

A curated list of awesome work on VAEs, disentanglement, representation learning, and generative models.

Stars: ✭ 418 (+450%)

Mutual labels: generative-model, unsupervised-learning

Discogan Pytorch

PyTorch implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

Stars: ✭ 961 (+1164.47%)

Mutual labels: generative-model, unsupervised-learning

CLSA

official implemntation for "Contrastive Learning with Stronger Augmentations"

Stars: ✭ 48 (-36.84%)

Mutual labels: unsupervised-learning, self-supervised-learning

adareg-monodispnet

Repository for Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction (CVPR2019)

Stars: ✭ 22 (-71.05%)

Mutual labels: unsupervised-learning, self-supervised-learning

learning-topology-synthetic-data

Tensorflow implementation of Learning Topology from Synthetic Data for Unsupervised Depth Completion (RAL 2021 & ICRA 2021)

Stars: ✭ 22 (-71.05%)

Mutual labels: unsupervised-learning, self-supervised-learning

Simclr

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Stars: ✭ 2,720 (+3478.95%)

Mutual labels: unsupervised-learning, self-supervised-learning

PiCIE

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

Stars: ✭ 102 (+34.21%)

Mutual labels: unsupervised-learning, self-supervised-learning

Dragan

A stable algorithm for GAN training

Stars: ✭ 189 (+148.68%)

Mutual labels: generative-model, unsupervised-learning

awesome-contrastive-self-supervised-learning

A comprehensive list of awesome contrastive self-supervised learning papers.

Stars: ✭ 748 (+884.21%)

Mutual labels: unsupervised-learning, self-supervised-learning

al-fk-self-supervision

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Stars: ✭ 28 (-63.16%)

Mutual labels: unsupervised-learning, self-supervised-learning

srVAE

VAE with RealNVP prior and Super-Resolution VAE in PyTorch. Code release for https://arxiv.org/abs/2006.05218.

Stars: ✭ 56 (-26.32%)

Mutual labels: generative-model, unsupervised-learning

Paysage

Unsupervised learning and generative models in python/pytorch.

Stars: ✭ 109 (+43.42%)

Mutual labels: generative-model, unsupervised-learning

View All Similar Projects ➔

Neural Relation Understanding

Naru is a suite of neural cardinality estimators for tabular data.

This repo contains the code for the VLDB'20 paper, Deep Unsupervised Cardinality Estimation.

Main modules:

common.py: a lightweight pandas-based library to load/analyze/represent tables
several deep autoregressive model architectures
ProgressiveSampling: approximate inference algorithms for deep autoregressive models
a generator for high-dimensional SQL queries
training/evaluation scripts

Quick start

To set up a conda environment, run:

conda env create -f environment.yml

Run the following to test on a tiny 100-row dataset:

source activate naru

# Trains a ResMADE on dataset 'dmv-tiny'.
# This will create a checkpoint with path 'models/dmv-tiny-<model spec>.pt'.
python train_model.py --epochs=100 --residual 

# Use the trained model as a cardinality estimator.
# --glob supports evaluating a set of checkpoints at once; here, there will only be one match.
python eval_model.py --glob='dmv-tiny*.pt' --residual

Model architectures

Naru currently implements three state-of-the-art autoregressive architectures:

MADE: a highly efficient masked MLP, introduced in Masked Autoencoder for Distribution Estimation (ICML'15).
ResMADE: MADE with residual connections, introduced in Autoregressive Energy Machines (ICML'19).
Transformer: an autoregressive Transformer, the architecture powering several recent breakthroughs in natural language processing (e.g., BERT, GPT-2, XLNet).

In principle, Naru's inference algorithms can interface with any autoregressive model and turn them into cardinality estimators.

Datasets

DMV. The DMV dataset is publically available at catalog.data.gov. The data is continuously updated. Our frozen snapshot (~11.6M tuples) can be downloaded by running

bash ./download_dmv.sh

Specify --dataset=dmv when launching the training/evaluation scripts.

Registering custom datasets

A user can point a Naru model to her own datasets in a few steps.

First, put a CSV file under datasets/. Second, define in datasets.py a LoadMyDataset() function:

def LoadMyDataset(filepath):   
    # Make sure that this loads data correctly.  
    df = pd.read_csv(filepath, **kwargs)  
    return CsvTable('Name of Dataset', df, cols=df.columns)

Last, call this function in the appropriate places inside the train/evaluation scripts. Search for current usage of args.dataset in those files and extend accordingly.

Running experiments

Run python train_model.py --help to see a list of tunable knobs. We recommend at least setting --residual --direct-io --column-masking. (In terms of learning efficiency, ResMADE learns faster than MADE, and --direct-io also helps. Architecture: Transformer can achieve lower negative log-likelihoods so it fits complex datasets better albeit being more expensive.)

When running evaluation (eval_model.py), include the same set of architecture flags to make sure checkpoint loading is correct.

Examples:

# Use a small 256x5 ResMADE model, with column masking.
python train_model.py --num-gpus=1 --dataset=dmv --epochs=20 --warmups=8000 --bs=2048 \
    --residual --layers=5 --fc-hiddens=256 --direct-io --column-masking

# Evaluate.  To enable estimators other than Naru, see section below.
python eval_model.py --dataset=dmv --glob='<ckpt from above>' --num-queries=2000 \
    --residual --layers=5 --fc-hiddens=256 --direct-io --column-masking
    
# Alternative: larger MADE model reported in paper.
python train_model.py --num-gpus=1 --dataset=dmv --epochs=100 --warmups=12000 --bs=2048 \
    --layers=0 --direct-io --column-masking --input-encoding=binary --output-encoding=one_hot

# Alternative: use a Transformer.
python train_model.py --num-gpus=1 --dataset=dmv --epochs=20 --warmup=20000 --bs=1024 \
    --blocks=4 --dmodel=64 --dff=256 --heads=4 --column-masking

Baseline cardinality estimators

We also include a set of baseline cardinality estimators known in the database literature:

Naru (--glob to find trained checkpoints)
Sampling (--run-sampling)
Bayes nets (--run-bn)
MaxDiff n-dimensional histogram (--run-maxdiff)
Postgres (see estimators.Postgres)

Example: to run experiments using trained Naru model(s) and a Sampler:

python eval_model.py --dataset=dmv --num-queries=2000 --glob='dmv*.pt' --run-sampling

Parameters controling these estimators can be adjusted inside eval_model.py.

Contributors

This repo was written by: Amog Kamsetty, Chenggang Wu, Eric Liang, Zongheng Yang.

Reference

If you find this repository useful in your work, please cite our VLDB'20 paper:

@inproceedings{naru,
  title={Deep Unsupervised Cardinality Estimation},
  author={Yang, Zongheng and Liang, Eric and Kamsetty, Amog and Wu, Chenggang and Duan, Yan and Chen, Xi and Abbeel, Pieter and Hellerstein, Joseph M and Krishnan, Sanjay and Stoica, Ion},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={3},
  pages={279--292},
  year={2019},
  publisher={VLDB Endowment}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

naru-project / naru

Programming Languages

Labels

Projects that are alternatives of or similar to naru

Neural Relation Understanding

Quick start

Model architectures

Datasets

Registering custom datasets

Running experiments

Baseline cardinality estimators

Contributors

Reference