All Projects → uclanlp → synpg

uclanlp / synpg

Licence: MIT license
Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to synpg

Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (+70%)
Mutual labels:  paraphrase-generation
SGCP
TACL 2020: Syntax-Guided Controlled Generation of Paraphrases
Stars: ✭ 67 (+123.33%)
Mutual labels:  paraphrase-generation
DisCont
Code for the paper "DisCont: Self-Supervised Visual Attribute Disentanglement using Context Vectors".
Stars: ✭ 13 (-56.67%)
Mutual labels:  disentangled-representations
concept-based-xai
Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (+36.67%)
Mutual labels:  disentangled-representations
disentangled graph collaborative filtering
Disentagnled Graph Collaborative Filtering, SIGIR2020
Stars: ✭ 118 (+293.33%)
Mutual labels:  disentangled-representations
CS-DisMo
[ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement
Stars: ✭ 20 (-33.33%)
Mutual labels:  disentangled-representations
DeFMO
[CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
Stars: ✭ 144 (+380%)
Mutual labels:  disentangled-representations
dependency-paraphraser
A sentence paraphraser based on dependency parsing and word embedding similarity.
Stars: ✭ 16 (-46.67%)
Mutual labels:  paraphrase-generation
Reinforce-Paraphrase-Generation
This repository contains the data and code for the paper "An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation" (EMNLP2019).
Stars: ✭ 76 (+153.33%)
Mutual labels:  paraphrase-generation
use-cases-of-bert
Use-cases of Hugging Face's BERT (e.g. paraphrase generation, unsupervised extractive summarization).
Stars: ✭ 18 (-40%)
Mutual labels:  paraphrase-generation
disent
🧶 Modular VAE disentanglement framework for python built with PyTorch Lightning ▸ Including metrics and datasets ▸ With strongly supervised, weakly supervised and unsupervised methods ▸ Easily configured and run with Hydra config ▸ Inspired by disentanglement_lib
Stars: ✭ 41 (+36.67%)
Mutual labels:  disentangled-representations
CHyVAE
Code for our paper -- Hyperprior Induced Unsupervised Disentanglement of Latent Representations (AAAI 2019)
Stars: ✭ 18 (-40%)
Mutual labels:  disentangled-representations
DRNET
PyTorch implementation of the NIPS 2017 paper - Unsupervised Learning of Disentangled Representations from Video
Stars: ✭ 45 (+50%)
Mutual labels:  disentangled-representations
deep-active-inference-mc
Deep active inference agents using Monte-Carlo methods
Stars: ✭ 41 (+36.67%)
Mutual labels:  disentangled-representations
linguistic-style-transfer-pytorch
Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch
Stars: ✭ 55 (+83.33%)
Mutual labels:  disentangled-representations

SynPG

Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{Huang2021synpg,
    author    = {Kuan-Hao Huang and Kai-Wei Chang},
    title     = {Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs},
    booktitle = {Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
    year      = {2021},
}

Setup

  • Python=3.7.10
$ pip install -r requirements.txt

Pretrained Models

Demo

python generate.py \
    --synpg_model_path ./model/pretrained_synpg.pt \
    --pg_model_path ./model/pretrained_parse_generator.pt \
    --input_path ./demo/input.txt \
    --output_path ./demo/output.txt \
    --bpe_codes_path ./data/bpe.codes \
    --bpe_vocab_path ./data/vocab.txt \
    --bpe_vocab_thresh 50 \
    --dictionary_path ./data/dictionary.pkl \
    --max_sent_len 40 \
    --max_tmpl_len 100 \
    --max_synt_len 160 \
    --temp 0.5 \
    --seed 0

Training

  • Download data and put them under ./data/
  • Download glove.840B.300d.txt and put it under ./data/
  • Run scripts/train_synpg.sh or the following command to train SynPG
python train_synpg.py \
    --model_dir ./model \
    --output_dir ./output \
    --bpe_codes_path ./data/bpe.codes \
    --bpe_vocab_path ./data/vocab.txt \
    --bpe_vocab_thresh 50 \
    --dictionary_path ./data/dictionary.pkl \
    --train_data_path ./data/train_data.h5 \
    --valid_data_path ./data/valid_data.h5 \
    --emb_path ./data/glove.840B.300d.txt \
    --max_sent_len 40 \
    --max_synt_len 160 \
    --word_dropout 0.4 \
    --n_epoch 5 \
    --batch_size 64 \
    --lr 1e-4 \
    --weight_decay 1e-5 \
    --log_interval 250 \
    --gen_interval 5000 \
    --save_interval 10000 \
    --temp 0.5 \
    --seed 0
  • Run scripts/train_parse_generator.sh or the following command to train the parse generator
python train_parse_generator.py \
    --model_dir ./model \
    --output_dir ./output_pg \
    --dictionary_path ./data/dictionary.pkl \
    --train_data_path ./data/train_data.h5 \
    --valid_data_path ./data/valid_data.h5 \
    --max_sent_len 40 \
    --max_tmpl_len 100 \
    --max_synt_len 160 \
    --word_dropout 0.2 \
    --n_epoch 5 \
    --batch_size 32 \
    --lr 1e-4 \
    --weight_decay 1e-5 \
    --log_interval 250 \
    --gen_interval 5000 \
    --save_interval 10000 \
    --temp 0.5 \
    --seed 0

Evaluating

  • Download testing data and put them under ./data/
  • Run scripts/eval.sh or the following command to evaluate SynPG
python eval_generate.py \
  --test_data ./data/test_data_mrpc.h5 \
  --dictionary_path ./data/dictionary.pkl \
  --model_path ./model/pretrained_synpg.pt \
  --output_dir ./eval/ \
  --bpe_codes ./data/bpe.codes \
  --bpe_vocab ./data/vocab.txt \
  --bpe_vocab_thresh 50 \
  --max_sent_len 40 \
  --max_synt_len 160 \
  --word_dropout 0.0 \
  --batch_size 64 \
  --temp 0.5 \
  --seed 0 \

python eval_calculate_bleu.py --ref ./eval/target_sents.txt --input ./eval/outputs.txt

The BLEU scores should be similar to the following.

MRPC PAN Quora
SynPG 26.2 27.3 33.2
SynPG-Large 36.2 27.1 34.7

Fine-Tuning

One main advantage of SynPG is that SynPG learns the paraphrase generation model without using any paraphrase pairs. Therefore, it is possible to fine-tune SynPG with the texts (without using the ground truth paraphrases) in the target domain when those texts are available. This fine-tuning step would significantly improve the quality of paraphrase generation in the target domain, as shown in our paper.

  • Download testing data and put them under ./data/
  • Run scripts/finetune_synpg.sh or the following command to finetune SynPG
python finetune_synpg.py \
  --model_dir ./model_finetune \
  --model_path ./model/pretrained_synpg.pt \
  --output_dir ./output_finetune \
  --bpe_codes_path ./data/bpe.codes \
  --bpe_vocab_path ./data/vocab.txt \
  --bpe_vocab_thresh 50 \
  --dictionary_path ./data/dictionary.pkl \
  --train_data_path ./data/test_data_mrpc.h5 \
  --valid_data_path ./data/test_data_mrpc.h5 \
  --max_sent_len 40 \
  --max_synt_len 160 \
  --word_dropout 0.4 \
  --n_epoch 50 \
  --batch_size 64 \
  --lr 1e-4 \
  --weight_decay 1e-5 \
  --log_interval 250 \
  --gen_interval 5000 \
  --save_interval 10000 \
  --temp 0.5 \
  --seed 0

We can observe the significant improvement on BLEU scores from the table below.

MRPC PAN Quora
SynPG 26.2 27.3 33.2
SynPG-Large 36.2 27.1 34.7
SynPG-Fine-Tune 48.7 37.7 49.8

Author

Kuan-Hao Huang / @ej0cl6

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].