All Projects → CompVis → geometry-free-view-synthesis

CompVis / geometry-free-view-synthesis

Licence: MIT license
Is a geometric model required to synthesize novel views from a single image?

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to geometry-free-view-synthesis

pysentimiento
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
Stars: ✭ 274 (+3.4%)
Mutual labels:  transformers
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (-55.85%)
Mutual labels:  transformers
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (-78.11%)
Mutual labels:  transformers
transformers-interpret
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Stars: ✭ 861 (+224.91%)
Mutual labels:  transformers
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-85.66%)
Mutual labels:  transformers
lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
Stars: ✭ 551 (+107.92%)
Mutual labels:  transformers
text-classification-transformers
Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)
Stars: ✭ 32 (-87.92%)
Mutual labels:  transformers
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-90.94%)
Mutual labels:  transformers
deepconsensus
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
Stars: ✭ 124 (-53.21%)
Mutual labels:  transformers
xpandas
Universal 1d/2d data containers with Transformers functionality for data analysis.
Stars: ✭ 25 (-90.57%)
Mutual labels:  transformers
wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (-85.28%)
Mutual labels:  transformers
anonymisation
Anonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (-67.92%)
Mutual labels:  transformers
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (-78.87%)
Mutual labels:  transformers
elastic transformers
Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers
Stars: ✭ 153 (-42.26%)
Mutual labels:  transformers
PyTorch-Model-Compare
Compare neural networks by their feature similarity
Stars: ✭ 119 (-55.09%)
Mutual labels:  transformers
transformer generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.
Stars: ✭ 58 (-78.11%)
Mutual labels:  transformers
long-short-transformer
Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch
Stars: ✭ 103 (-61.13%)
Mutual labels:  transformers
modules
The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We develop a method for analyzing emerging functional modularity in neural networks based on differentiable weight masks and use it to point out important issues in current-day neural networks.
Stars: ✭ 25 (-90.57%)
Mutual labels:  transformers
gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (-37.74%)
Mutual labels:  transformers
WellcomeML
Repository for Machine Learning utils at the Wellcome Trust
Stars: ✭ 31 (-88.3%)
Mutual labels:  transformers

Geometry-Free View Synthesis: Transformers and no 3D Priors

teaser

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*, Patrick Esser*, Björn Ommer
* equal contribution

arXiv | BibTeX | Colab

Interactive Scene Exploration Results

RealEstate10K:
realestate
Videos: short (2min) / long (12min)

ACID:
acid
Videos: short (2min) / long (9min)

Demo

For a quickstart, you can try the Colab demo, but for a smoother experience we recommend installing the local demo as described below.

Installation

The demo requires building a PyTorch extension. If you have a sane development environment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, you can also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesis
conda env create -f geometry-free-view-synthesis/environment.yaml
conda activate geofree
pip install geometry-free-view-synthesis/

Running

After installation, running

braindance.py

will start the demo on a sample scene. Explore the scene interactively using the WASD keys to move and arrow keys to look around. Once positioned, hit the space bar to render the novel view with GeoGPT.

You can move again with WASD keys. Mouse control can be activated with the m key. Run braindance.py <folder to select image from/path to image> to run the demo on your own images. By default, it uses the re-impl-nodepth (trained on RealEstate without explicit transformation and no depth input) which can be changed with the --model flag. The corresponding checkpoints will be downloaded the first time they are required. Specify an output path using --video path/to/vid.mp4 to record a video.

> braindance.py -h
usage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]

What's up, BD-maniacs?

key(s)       action                  
=====================================
wasd         move around             
arrows       look around             
m            enable looking with mouse
space        render with transformer 
q            quit                    

positional arguments:
  path                  path to image or directory from which to select image. Default example is used if not specified.

optional arguments:
  -h, --help            show this help message and exit
  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}
                        pretrained model to use.
  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training on RealEstate10K and ACID. Both come in the same format as described here and the preparation is the same for both of them. You will need to have colmap installed and available on your $PATH.

We assume that you have extracted the .txt files of the dataset you want to prepare into $TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT
├── test
│   ├── 000c3ab189999a83.txt
│   ├── ...
│   └── fff9864727c42c80.txt
└── train
    ├── 0000cc6d8b108390.txt
    ├── ...
    └── ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution 640 x 360) into $IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT
├── test
│   ├── 000c3ab189999a83
│   │   ├── 45979267.png
│   │   ├── ...
│   │   └── 55255200.png
│   ├── ...
│   ├── 0017ce4c6a39d122
│   │   ├── 40874000.png
│   │   ├── ...
│   │   └── 48482000.png
├── train
│   ├── ...

To prepare the $SPLIT split of the dataset ($SPLIT being one of train, test for RealEstate and train, test, validation for ACID) in $SPA_ROOT, run the following within the scripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply set TXT_ROOT, IMG_ROOT and SPA_ROOT as environment variables and run ./sparsify_realestate.sh or ./sparsify_acid.sh. Take a look into the sources to run with multiple workers in parallel.

Finally, symlink $SPA_ROOT to data/realestate_sparse/data/acid_sparse.

First Stage Models

As described in our paper, we train the transformer models in a compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently downloaded by running

python scripts/download_vqmodels.py 

which will also create symlinks ensuring that the paths specified in the training configs (see configs/*) exist. In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to the taming transformers repository.

Running the Training

After both the preparation of the data and the first stage models are done, the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0,

where <dataset> is one of realestate/acid and <experiment> is one of expl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid. These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

variants

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,
      title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, 
      author={Robin Rombach and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2104.07652},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].