MiuLab / Taylorgan
Programming Languages
TaylorGAN
Source code of our NeurIPS 2020 poster paper TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation
Paper | arXiv (including appendix)
Setup
Environment
cp .env.sample .env
modify the CHECKPOINT_DIR
, DISK_CACHE_DIR
, TENSORBOARD_PORT
, TENSORBOARD_LOGDIR
as you need.
Datasets
Download the text datasets from the following links:
then, set the path in datasets/corpus.yaml
to these text files.
Pretrained Embeddings
Download the pretrained fast text embeddings
and set the PRETRAINED_EN_WORD_FASTTEXT_PATH
in .env
to this file.
Install
Install peotry first: docs
Install the packages:
$ poetry install
After installation:
$ poetry shell
Tensorflow-GPU (TODO)
For Developers (TODO)
Scripts
Train GAN
$ python src/scripts/train/GAN.py
- Usage
usage: GAN.py [-h] --dataset {coco_cleaned, news_cleaned, test} [--maxlen positive-int] [--vocab_size positive-int]
[-g {gru, test}(*args, **kwargs)] [--tie-embeddings] [--g-fix-embeddings]
[-d {cnn, resnet, test}(*args, **kwargs)] [--d-fix-embeddings]
[--loss {alt, JS, KL, RKL}]
[--estimator {reinforce, st, taylor, gumbel}(*args, **kwargs)] [--d-steps positive-int]
[--g-regularizers REGULARIZER(*args, **kwargs) [REGULARIZER(*args, **kwargs) ...]]
[--d-regularizers REGULARIZER(*args, **kwargs) [REGULARIZER(*args, **kwargs) ...]]
[--g-optimizer {sgd, rmsprop, adam, radam}(*args, **kwargs)]
[--d-optimizer {sgd, rmsprop, adam, radam}(*args, **kwargs)] [--epochs positive-int]
[--batch-size positive-int] [--random-seed int] [--bleu [int∈[1, 5]]] [--fed [positive-int]] [--checkpoint-root Path]
[--serving-root Path] [--save-period positive-int] [--tensorboard [Path]] [--tags TAG [TAG ...]] [--jit] [--debug] [--profile [Path]]
See more details and custom options for models/optimizers/regularizers:
python src/scripts/train/GAN.py -h
- NeurIPS 2020 Parameters
python src/scripts/train/GAN.py \
--dataset news_cleaned \
-g gru --tie-embeddings --g-reg 'entropy(0.02)' \
-d 'cnn(activation="elu")' --d-reg 'spectral(0.07)' 'embedding(0.2, max_norm=1)' \
--estimator 'taylor(bandwidth=0.5)' --loss RKL \
--random-seed 2020 \
--bleu 5 --fed 10000
Run with Tensorboard
First, run a tensorboard,
sh src/scripts/run_tensorboard.sh
It will run a tensorboard that listen to $TENSORBOARD_LOGDIR
and setup a server at port $TENSORBOARD_PORT
. To change these settings, change these variables in .env
.
Then, run the train script with tensorboard logging enabled:
python src/scripts/train/GAN.py ... --tensorboard
then, you can view the results by lauching localhost:(TENSORBOARD_PORT)
with the web browser.
Evaluate
TODO