All Projects → tangbinh → Machine Translation

tangbinh / Machine Translation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Machine Translation

Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+1841.18%)
Mutual labels:  seq2seq, machine-translation, sequence-to-sequence, attention-is-all-you-need, transformer
Nmt Keras
Neural Machine Translation with Keras
Stars: ✭ 501 (+882.35%)
Mutual labels:  machine-translation, sequence-to-sequence, attention-is-all-you-need, transformer
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+700%)
Mutual labels:  seq2seq, sequence-to-sequence, transformer
Transformers without tears
Transformers without Tears: Improving the Normalization of Self-Attention
Stars: ✭ 80 (+56.86%)
Mutual labels:  machine-translation, attention-is-all-you-need, transformer
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+272.55%)
Mutual labels:  seq2seq, attention-is-all-you-need, transformer
Joeynmt
Minimalist NMT for educational purposes
Stars: ✭ 420 (+723.53%)
Mutual labels:  seq2seq, machine-translation, transformer
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+6601.96%)
Mutual labels:  seq2seq, sequence-to-sequence, transformer
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-45.1%)
Mutual labels:  transformer, seq2seq, attention-is-all-you-need
dynmt-py
Neural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-66.67%)
Mutual labels:  machine-translation, seq2seq, sequence-to-sequence
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+794.12%)
Mutual labels:  transformer, seq2seq, attention-is-all-you-need
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+17.65%)
Mutual labels:  transformer, seq2seq, attention-is-all-you-need
Witwicky
Witwicky: An implementation of Transformer in PyTorch.
Stars: ✭ 21 (-58.82%)
Mutual labels:  machine-translation, attention-is-all-you-need, transformer
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (+672.55%)
Mutual labels:  seq2seq, transformer
Tf Seq2seq
Sequence to sequence learning using TensorFlow.
Stars: ✭ 387 (+658.82%)
Mutual labels:  seq2seq, sequence-to-sequence
Neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
Stars: ✭ 400 (+684.31%)
Mutual labels:  machine-translation, sequence-to-sequence
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (+643.14%)
Mutual labels:  seq2seq, transformer
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+705.88%)
Mutual labels:  attention-is-all-you-need, transformer
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+962.75%)
Mutual labels:  sequence-to-sequence, transformer
Nmt List
A list of Neural MT implementations
Stars: ✭ 359 (+603.92%)
Mutual labels:  machine-translation, sequence-to-sequence
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+813.73%)
Mutual labels:  seq2seq, transformer

Overview

This repository contains PyTorch implementations of sequence to sequence models for machine translation. The code is based on fairseq and purportedly made simple for the sake of readability, although main features such as multi-GPU training and beam search remain intact.

Two encoder-decoder models are implemented in this repository: a classic model based on LSTM networks with attention mechanism (Bahdanau et al.) and Transformer, a recently favored model built entirely from self-attention (Vaswani et al.).

drawing

Installation

The code was written for Python 3.6 or higher, and it has been tested with PyTorch 0.4.1. Training is only available with GPU. To get started, try to clone the repository

git clone https://github.com/tangbinh/machine-translation
cd machine-translation

Preprocessing

To download the IWSLT'14 DE-EN dataset and perform tokenization, it might be easier to just run:

bash download.sh

Then, the following commands help build dictionaries and map tokens into indices:

DATA_PATH=data/iwslt14.tokenized.de-en
python preprocess.py --source-lang de --target-lang en --train-prefix $DATA_PATH/train --valid-prefix $DATA_PATH/valid --test-prefix $DATA_PATH/test --dest-dir data-bin/iwslt14.tokenized.de-en

Training

To get started with training a model on SQuAD, you might find the following commands helpful:

python train.py --data data-bin/iwslt14.tokenized.de-en --source-lang de --target-lang en --lr 0.25 --clip-norm 0.1 --max-tokens 12000 --save-dir checkpoints/transformer

Prediction

When the training is done, you can make predictions and compute BLEU scores:

python generate.py --data data-bin/iwslt14.tokenized.de-en --checkpoint-path checkpoints/transformer/checkpoint_best.pt > /tmp/lstm.out
grep ^H /tmp/lstm.out | cut -f2- | sed -r 's/'$(echo -e "\033")'\[[0-9]{1,2}(;([0-9]{1,2})?)?[mK]//g' > /tmp/transformer.sys
grep ^T /tmp/lstm.out | cut -f2- | sed -r 's/'$(echo -e "\033")'\[[0-9]{1,2}(;([0-9]{1,2})?)?[mK]//g' > /tmp/transformer.ref
python score.py --reference /tmp/transformer.ref --system /tmp/transformer.sys
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].