All Projects → leviswind → Pytorch Transformer

leviswind / Pytorch Transformer

pytorch implementation of Attention is all you need

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pytorch Transformer

Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+397.49%)
Mutual labels:  translation, attention-is-all-you-need, transformer
Transformer
A TensorFlow Implementation of the Transformer: Attention Is All You Need
Stars: ✭ 3,646 (+1732.16%)
Mutual labels:  translation, attention-is-all-you-need, transformer
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+106.53%)
Mutual labels:  attention-is-all-you-need, transformer
Nmt Keras
Neural Machine Translation with Keras
Stars: ✭ 501 (+151.76%)
Mutual labels:  attention-is-all-you-need, transformer
Awesome Fast Attention
list of efficient attention modules
Stars: ✭ 627 (+215.08%)
Mutual labels:  attention-is-all-you-need, transformer
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+47.74%)
Mutual labels:  attention-is-all-you-need, transformer
Transformer Tensorflow
TensorFlow implementation of 'Attention Is All You Need (2017. 6)'
Stars: ✭ 319 (+60.3%)
Mutual labels:  translation, transformer
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+183.92%)
Mutual labels:  attention-is-all-you-need, transformer
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (-69.85%)
Mutual labels:  transformer, attention-is-all-you-need
Machine Translation
Stars: ✭ 51 (-74.37%)
Mutual labels:  attention-is-all-you-need, transformer
Transformers without tears
Transformers without Tears: Improving the Normalization of Self-Attention
Stars: ✭ 80 (-59.8%)
Mutual labels:  attention-is-all-you-need, transformer
Keras Transformer
Transformer implemented in Keras
Stars: ✭ 273 (+37.19%)
Mutual labels:  translation, transformer
attention-is-all-you-need-paper
Implementation of Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
Stars: ✭ 97 (-51.26%)
Mutual labels:  transformer, attention-is-all-you-need
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (-28.14%)
Mutual labels:  translation, transformer
pynmt
a simple and complete pytorch implementation of neural machine translation system
Stars: ✭ 13 (-93.47%)
Mutual labels:  translation, transformer
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+156.28%)
Mutual labels:  translation, transformer
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-85.93%)
Mutual labels:  transformer, attention-is-all-you-need
speech-transformer
Transformer implementation speciaized in speech recognition tasks using Pytorch.
Stars: ✭ 40 (-79.9%)
Mutual labels:  transformer, attention-is-all-you-need
Witwicky
Witwicky: An implementation of Transformer in PyTorch.
Stars: ✭ 21 (-89.45%)
Mutual labels:  attention-is-all-you-need, transformer
Njunmt Tf
An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.
Stars: ✭ 97 (-51.26%)
Mutual labels:  translation, transformer

A Pytorch Implementation of the Transformer: Attention Is All You Need

Our implementation is largely based on Tensorflow implementation

Requirements

Why This Project?

I'm a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that's it. I got similar result compared with the original tensorflow implementation.

Differences with the original paper

I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are

  • I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
  • I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
  • I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
  • The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).

File description

  • hyperparams.py includes all hyper parameters that are needed.
  • prepro.py creates vocabulary files for the source and the target.
  • data_load.py contains functions regarding loading and batching data.
  • modules.py has all building blocks for encoder/decoder networks.
  • train.py has the model.
  • eval.py is for evaluation.

Training

wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora
  • STEP 2. Adjust hyper parameters in hyperparams.py if necessary.
  • STEP 3. Run prepro.py to generate vocabulary files to the preprocessed folder.
  • STEP 4. Run train.py or download pretrained weights, put it into folder './models/' and change the eval_epoch in hpyerparams.py to 18
  • STEP 5. Show loss and accuracy in tensorboard
tensorboard --logdir runs

Evaluation

  • Run eval.py.

Results

I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

source: Ich bin nicht sicher was ich antworten soll
expected: I'm not really sure about the answer
got: I'm not sure what I'm going to answer

source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference

source: Vielen Dank
expected: Thank you
got: Thank you

source: Das ist ein Baum
expected: This is a tree
got: So this is a tree

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].