Contextual MT

Implementations of context-aware models for document-level translation tasks, used in

Currently supports:

Training concatenation-based document-level machine translation models
Training with CoWord dropout and dynamic context size
Measuring CXMI for models
Training with attention regularization

Requirements

Fairseq >= add65ad
SentencePiece >= 0.1.90
COMET
Python >= 3.6

Also run

pip install -e .

To have access to the entrypoints (such as the evaluation script) in your path

Pre-processing

Preprocessing consists of training a SentencePiece model and binarizing the data To preprocess a set of files {train,valid,test}.{src,tgt} run

An example prerpocessing would be

VOCAB_SIZE=32000

for lang in en fr; do
    python $REPO/scripts/spm_train.py \
        ${data_dir}/train.${lang} \
        --model-prefix ${data_dir}/prep/spm.${lang} \
        --vocab-file ${data_dir}/prep/dict.${lang}.txt \
        --vocab-size $VOCAB_SIZE
done
for split in train valid test; do
    for lang in en fr; do
        python $REPO/scripts/spm_encode.py \
            --model ${data_dir}/prep/spm.$lang.model \
                < ${data_dir}/${split}.${lang} \
                > ${data_dir}/prep/${split}.sp.${lang}
    done
done
fairseq-preprocess \
    --source-lang src --target-lang tgt \
    --trainpref ${data_dir}/prep/train.sp \
    --validpref ${data_dir}/pep/valid.sp \
    --testpref ${data_dir}/prep/test.sp \
    --srcdict ${data_dir}/prep/dict.src.txt \
    --tgtdict ${data_dir}/prep/dict.tgt.txt \
    --destdir ${data_dir}/bin

In addition, this repo already includes some scripts to preprocess some dataset. Please refer to scripts/data

Training

Document-level translation

You can train using fairseq's training tool. Just select the document_translation task with the approriate context sizes.

For example, to train a model, with N source context size and M target context size, with dynamic sampling and CoWord dropout

fairseq-train \
    ${bin_dir} --user-dir $REPO/contextual_mt \
    --task document_translation \
    --source-context-size $N --target-context-size $M \
    --sample-context-size \
    --coword-dropout 0.1 \
    --log-interval 10 \
    --arch contextual_transformer --share-decoder-input-output-embed  \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.1 \
    --lr 5e-4 --lr-scheduler inverse_sqrt  --warmup-updates 4000 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.3 --weight-decay 0.0001 \
    --max-tokens  4096 --update-freq 8 --patience 10 --seed 42 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-remove-bpe sentencepiece \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-dir ${checkpoint_dir} --no-epoch-checkpoints

To apply attention regularization during training, select the attention_regularization task:

fairseq-train \
    ${bin_dir} --user-dir $REPO/dialogue_mt \
    --task attention_regularization \
    --source-context-size $N --target-context-size $M \
    --source-lang en --target-lang fr \
    --log-interval 10 \
    --arch attn_reg_transformer --share-decoder-input-output-embed  \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.1 \
    --lr 5e-4 --lr-scheduler inverse_sqrt  --warmup-updates 4000 \
    --criterion attention_loss --label-smoothing 0.1 --dropout 0.3 --weight-decay 0.0001 \
    --regularize-heads 0 --regularize-attention enc cross self \
    --enc-alignment-layer --cross-alignment-layer 0 --self-alignment-layer 5 \
    --highlight-sample 0.2 --kl-lambda 10 \
    --max-tokens  4096 --max-tokens-valid 1024 --update-freq 8 --seed 42 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-remove-bpe sentencepiece \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --save-dir ${checkpoint_dir} --no-epoch-checkpoints

Inference and Evaluation

You can then run inference

cp ${data_dir}/dict.*txt ${data_dir}/spm* $CHECKPOINTS_DIR

python contextual_mt/docmt_translate.py \
    --path $checkpoint_dir \
    --source-lang src --target-lang tgt \
    --source-file ${data_dir}/test.src \
    --predictions-file ${predictions_dir}/test.pred.tgt \
    --docids-file ${data_dir}/test.docids \

    --beam 5

To score the predictions, a helper script is provided

python scripts/score.py ${predictions_dir}/test.pred.tgt ${data_dir}/test.tgt \
    --src ${data_dir}/test.src \
    --comet-model wmt-large-da-estimator-1719 \
    --comet-path $COMET_DIR

Measuring CXMI

To measure the CXMI for a model with a specific context size, run:

python contextual_mt/docmt_cxmi.py \
    --path $checkpoint_dir \
    --source-lang src --target-lang tgt \
    --source-file ${data_dir}/test.src \
    --reference-file ${data_idr}/test.tgt \
    --docids-file ${data_dir}/test.docids \
    --source-context-size $N \
    --target-context-isze $M

For the more detailed analyises, please refer to the notebook notebooks/measuring_context_usage.ipynb

Contrastive evaluation

To run contrastive evaluation on ContraPro or Bawden's dataset

python contextual_mt/docmt_contrastive_eval.py \
    --source-lang src --target-lang tgt \
    --source-file $source_contr \
    --src-context-file $source_ctx_contr \
    --target-file $target_contr \
    --tgt-context-file $target_ctx_contr

Workflows

To facilitate reproducing the paper results, we include files to run the whole trainign and evaluation pipelines.

Start by install ducttape.

Measuring and Increasing Context Usage

Start by modifying the relevant paths in tapes/cxmi_paper.tconf. Also modify the submitter to the one best suited for you system. The original work used the slurm submitter.

ducttape tapes/cxmi_paper.tape -C tapes/cxmi_paper.tconf -j $num_parallel_jobs

To obtain a summary of the experimental results, run

ducttape tapes/cxmi_paper.tape -C tapes/cxmi_paper.tconf summary

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

neulab / contextual-mt

Programming Languages