RobertCsordas / transformer_generalization

Licence: MIT license

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to transformer generalization

modules

The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We develop a method for analyzing emerging functional modularity in neural networks based on differentiable weight masks and use it to point out important issues in current-day neural networks.

Stars: ✭ 25 (-56.9%)

Mutual labels: paper, transformers, generalization

Transferlearning

Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习

Stars: ✭ 8,481 (+14522.41%)

Mutual labels: paper, generalization

resources

No description or website provided.

Stars: ✭ 14 (-75.86%)

Mutual labels: paper

CURL

Code for the ICPR 2020 paper: "CURL: Neural Curve Layers for Image Enhancement"

Stars: ✭ 177 (+205.17%)

Mutual labels: paper

Mirai

Mirai 未来 - A powerful Minecraft Server Software coming from the future

Stars: ✭ 325 (+460.34%)

Mutual labels: paper

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Stars: ✭ 102 (+75.86%)

Mutual labels: transformers

BottleneckTransformers

Bottleneck Transformers for Visual Recognition

Stars: ✭ 231 (+298.28%)

Mutual labels: transformers

ginza-transformers

Use custom tokenizers in spacy-transformers

Stars: ✭ 15 (-74.14%)

Mutual labels: transformers

efficient-attention

An implementation of the efficient attention module.

Stars: ✭ 191 (+229.31%)

Mutual labels: paper

awesome-huggingface

🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.

Stars: ✭ 436 (+651.72%)

Mutual labels: transformers

External-Attention-pytorch

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

Stars: ✭ 7,344 (+12562.07%)

Mutual labels: paper

transformers-lightning

A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transformers.

Stars: ✭ 45 (-22.41%)

Mutual labels: transformers

pomdp-baselines

Simple (but often Strong) Baselines for POMDPs in PyTorch - ICML 2022

Stars: ✭ 162 (+179.31%)

Mutual labels: generalization

Chinese-Minority-PLM

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Stars: ✭ 133 (+129.31%)

Mutual labels: transformers

Recurrent-Independent-Mechanisms

Implementation of the paper Recurrent Independent Mechanisms (https://arxiv.org/pdf/1909.10893.pdf)

Stars: ✭ 90 (+55.17%)

Mutual labels: generalization

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+294.83%)

Mutual labels: transformers

Introduction-to-Deep-Learning-and-Neural-Networks-Course

Code snippets and solutions for the Introduction to Deep Learning and Neural Networks Course hosted in educative.io

Stars: ✭ 33 (-43.1%)

Mutual labels: transformers

awesome-secure-computation

Awesome list for cryptographic secure computation paper. This repo includes *Lattice*, *DifferentialPrivacy*, *MPC* and also a comprehensive summary for top conferences.

Stars: ✭ 125 (+115.52%)

Mutual labels: paper

code-transformer

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Stars: ✭ 130 (+124.14%)

Mutual labels: transformers

text-classification-transformers

Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)

Stars: ✭ 32 (-44.83%)

Mutual labels: transformers

View All Similar Projects ➔

Codebase for training transformers on systematic generalization datasets.

The official repository for our EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Please note that this repository is a cleaned-up version of the internal research repository we use. In case you encounter any problems with it, please don't hesitate to contact me.

Setup

This project requires Python 3 (tested with Python 3.8 and 3.9) and PyTorch 1.8.

pip3 install -r requirements.txt

Create a Weights and Biases account and run

wandb login

More information on setting up Weights and Biases can be found on https://docs.wandb.com/quickstart.

For plotting, LaTeX is required (to avoid Type 3 fonts and to render symbols). Installation is OS specific.

Downloading data

All datasets are downloaded automatically except the Mathematics Dataset and CFQ which is hosted in Google Cloud and one has to log in with his/her Google account to be able to access it.

Math dataset

Download the .tar.gz file manually from here:

https://console.cloud.google.com/storage/browser/mathematics-dataset?pli=1

Copy it to the cache/dm_math/ folder. You should have a cache/dm_math/mathematics_dataset-v1.0.tar.gz file in the project folder if you did everyhing correctly.

CFQ

Download the .tar.gz file manually from here:

https://storage.cloud.google.com/cfq_dataset/cfq1.1.tar.gz

Copy it to the cache/CFQ/ folder. You should have a cache/CFQ/cfq1.1.tar.gz file in the project folder if you did everyhing correctly.

Usage

Running the experiments from the paper on a cluster

The code makes use of Weights and Biases for experiment tracking. In the sweeps directory, we provide sweep configurations for all experiments we have performed. The sweeps are officially meant for hyperparameter optimization, but we use them to run multiple configurations and seeds.

To reproduce our results, start a sweep for each of the YAML files in the sweeps directory. Run wandb agent for each of them in the root directory of the project. This will run all the experiments, and they will be displayed on the W&B dashboard. The name of the sweeps must match the name of the files in sweeps directory, except the .yaml ending. More details on how to run W&B sweeps can be found at https://docs.wandb.com/sweeps/quickstart.

For example, if you want to run Math Dataset experiments, run wandb sweep --name dm_math sweeps/dm_math.yaml. This creates the sweep and prints out its ID. Then run wandb agent <ID> with that ID.

Re-creating plots from the paper

Edit config file paper/config.json. Enter your project name in the field "wandb_project" (e.g. "username/project").

Run the scripts in the paper directory. For example:

cd paper
./run_all.sh

The output will be generated in the paper/out/ directory. Tables will be printed to stdout in latex format.

If you want to reproduce individual plots, it can be done by running individial python files in the paper directory.

Running experiments locally

It is possible to run single experiments with Tensorboard without using Weights and Biases. This is intended to be used for debugging the code locally.

If you want to run experiments locally, you can use run.py:

./run.py sweeps/tuple_rnn.yaml

If the sweep in question has multiple parameter choices, run.py will interactively prompt choices of each of them.

The experiment also starts a Tensorboard instance automatically on port 7000. If the port is already occupied, it will incrementally search for the next free port.

Note that the plotting scripts work only with Weights and Biases.

Reducing memory usage

In case some tasks won't fit on your GPU, play around with "-max_length_per_batch " argument. It can trade off memory usage/speed by slicing batches and executing them in multiple passes. Reduce it until the model fits.

BibTex

@inproceedings{csordas2021devil,
      title={The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers}, 
      author={R\'obert Csord\'as and Kazuki Irie and J\"urgen Schmidhuber},
      booktitle={Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP)},
      year={2021},
      month={November},
      address={Punta Cana, Dominican Republic}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

RobertCsordas / transformer_generalization

Programming Languages

Labels

Projects that are alternatives of or similar to transformer generalization

Codebase for training transformers on systematic generalization datasets.

Setup

Downloading data

Math dataset

CFQ

Usage

Running the experiments from the paper on a cluster

Re-creating plots from the paper

Running experiments locally

Reducing memory usage

BibTex