Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.

Stars: ✭ 411 (+323.71%)

Mutual labels: attention, transformer

Rust Bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Stars: ✭ 510 (+425.77%)

Mutual labels: translation, transformer

Speech Transformer

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

Stars: ✭ 565 (+482.47%)

Mutual labels: attention, transformer

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+320.62%)

Mutual labels: attention, transformer

Cell Detr

Official and maintained implementation of the paper Attention-Based Transformers for Instance Segmentation of Cells in Microstructures [BIBM 2020].

Stars: ✭ 26 (-73.2%)

Mutual labels: attention, transformer

Deep learning nlp

Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP

Stars: ✭ 407 (+319.59%)

Mutual labels: attention, nmt

Neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.

Stars: ✭ 400 (+312.37%)

Mutual labels: neural-machine-translation, nmt

Awesome Fast Attention

list of efficient attention modules

Stars: ✭ 627 (+546.39%)

Mutual labels: attention, transformer

Nlp tensorflow project

Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.

Stars: ✭ 27 (-72.16%)

Mutual labels: attention, nmt

Nematus

Open-Source Neural Machine Translation in Tensorflow

Stars: ✭ 730 (+652.58%)

Mutual labels: neural-machine-translation, nmt

Transformer Dynet

An Implementation of Transformer (Attention Is All You Need) in DyNet

Stars: ✭ 57 (-41.24%)

Mutual labels: neural-machine-translation, transformer

Rnn Nmt

基于双向RNN，Attention机制的编解码神经机器翻译模型

Stars: ✭ 46 (-52.58%)

Mutual labels: neural-machine-translation, nmt

View All Similar Projects ➔

NJUNMT-tf

NJUNMT-tf is a general purpose sequence modeling tool in TensorFlow while neural machine translation is the main target task.

Key features

NJUNMT-tf builds NMT models almost from scratch without any high-level TensorFlow APIs which often hide details of many network components and lead to obscure code structure that is difficult to understand and manipulate. NJUNMT-tf only depends on basic TensorFlow modules, like array_ops, math_ops and nn_ops. Each operation in the code is under control.

NJUNMT-tf focuses on modularity and extensibility using standard TensorFlow modules and practices to support advanced modeling capability:

arbitrarily complex encoder architectures, e.g. Bidirectional RNN encoder, Unidirectional RNN encoder and self-attention.
arbitrarily complex decoder architectures, e.g. Conditional GRU/LSTM decoder, attention decoder and self-attention.
hybrid encoder-decoder models, e.g. self-attention encoder and RNN decoder or vice versa.

and all of the above can be used simultaneously to train novel and complex architectures.

The code also supports:

model ensemble.
learning rate decaying according to loss on evaluation data.
model validation on evaluation data with BLEU score and early stop strategy.
monitoring with TensorBoard.
capability for BPE

Requirements

tensorflow (>=1.6)
pyyaml

Quickstart

Here is a minimal workflow to get you started in using NJUNMT-tf. This example uses a toy Chinese-English dataset for machine translation with a toy setting.

1. Build the word vocabularies:

python -m bin.generate_vocab testdata/toy.zh --max_vocab_size 100  > testdata/vocab.zh
python -m bin.generate_vocab testdata/toy.en0 --max_vocab_size 100  > testdata/vocab.en

2. Train with preset sequence-to-sequence parameters:

export CUDA_VISIBLE_DEVICES=
python -m bin.train --model_dir test_model \
    --config_paths "
        ./njunmt/example_configs/toy_seq2seq.yml,
        ./njunmt/example_configs/toy_training_options.yml,
        ./default_configs/default_optimizer.yml"

3. Translate a test file with the latest checkpoint:

export CUDA_VISIBLE_DEVICES=
python -m bin.infer --model_dir test_models \
  --infer "
    beam_size: 4
    source_words_vocabulary: testdata/vocab.zh
    target_words_vocabulary: testdata/vocab.en" \
  --infer_data "
    - features_file: testdata/toy.zh
      labels_file: testdata/toy.en
      output_file: toy.trans
      output_attention: false"

Note: do not expect any good translation results with this toy example. Consider training on larger parallel datasets instead.

Configuration

As you can see, there are two ways to manipulate hyperparameters of the process:

tf FLAGS
yaml-style config file

For example, there is a config file specifying the datasets for training procedure.

# datasets.yml
data:
  train_features_file: testdata/toy.zh
  train_labels_file: testdata/toy.en0
  eval_features_file: testdata/toy.zh
  eval_labels_file: testdata/toy.en
  source_words_vocabulary: testdata/vocab.zh
  target_words_vocabulary: testdata/vocab.en

You can either use the command:

python -m bin.train --config_paths "datasets.yml" ...

python -m bin.train --data "
    train_features_file: testdata/toy.zh
    train_labels_file: testdata/toy.en0
    eval_features_file: testdata/toy.zh
    eval_labels_file: testdata/toy.en
    source_words_vocabulary: testdata/vocab.zh
    target_words_vocabulary: testdata/vocab.en" ...

They are of the same effect.

The available FLAGS (or the top levels of yaml configs) for bin.train are as follows:

config_paths: the paths for config files
model_dir: the directory for saving checkpoints
problem_name: The top name scope, "seq2seq" by default
train: training options, e.g. batch size, maximum length
data: training data, evaluation data, vocabulary and (optional) BPE codes
hooks: a list of training hooks (not provided, in the current version)
metrics: a list of evaluation metrics on evaluation data
model: the class name of the model
model_params: parameters for the model
optimizer_params: parameters for optimizer

The available FLAGS (or the top levels of yaml configs) for bin.infer are as follows:

config_paths: the paths for config files
model_dir: the checkpoint directory or directories separated by commas for model ensemble
infer: inference options, e.g. beam size, length penalty rate
infer_data: a list of data file to be translated
weight_scheme: the weight scheme for model ensemble (only "average" available now)

Note that:

each FLAG should be a string of yaml-style
the hyperparameters provided by FLAGS will overwrite those presented in config files
illegal parameters will interrupt the program, so see sample.yml of more detailed discription for each parameter.

Benchmarks

The RNN benchmarks are performed on 1 GTX 1080Ti GPU with predefined configurations:

default_configs/adam_loss_decay.yml
default_configs/default_metrics.yml
default_configs/default_training_options.yml
default_configs/seq2seq_cgru.yml

The Transformer benchmarks are performed on 1 GTX 1080Ti GPU with predefined configurations:

default_configs/transformer_base.yml
default_configs/transformer_training_options.yml

Note that in Transformer model, we set batch_tokens_size=2500 with update_cycle=10 to realize pseudo parallel training.

The beam sizes for RNN and Transformer are 10 and 4 respectively.

The datasets are preprocessed using fetch_wmt2017_ende.sh and fetch_wmt2018_zhen.sh referring to Edinburgh’s Report.

The BLEU scores are evaluated by the wrapper script run_mteval.sh. For EN-ZH experiments, the BLEU scores are evaluated at character-level while others are evaluated at word-level.

Dataset	Model	BLEU
Dataset	Model	newstest2016(dev)	newstest2017
WMT17 EN-DE	RNN	29.6	23.6
WMT17 EN-DE	Transformer	33.5	27.0
WMT17 DE-EN	RNN	34.0	29.6
WMT17 DE-EN	Transformer	37.6	33.1

Dataset	Model	BLEU
Dataset	Model	newsdev2017(dev)	newstest2017
WMT17 ZH-EN	RNN	19.7	21.2
WMT17 ZH-EN	Transformer	22.7	25.0
WMT17 EN-ZH	RNN	30.0	30.2
WMT17 EN-ZH	Transformer	34.9	35.0

TODO

The following features remain unimplemented:

multi-gpu training
schedule sampling
minimum risk training

Acknowledgments

The implementation is inspired by the following:

Contact

Any comments or suggestions are welcome.

Please email [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 97

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗