Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → soskek → Attention_is_all_you_need

soskek / Attention_is_all_you_need

Licence: bsd-3-clause

Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.

Labels

jupyter-notebook deep-learning neural-network deep-neural-networks google attention-mechanism chainer

Projects that are alternatives of or similar to Attention is all you need

Action Recognition Visual Attention

Action recognition using soft attention based deep recurrent neural networks

Stars: ✭ 350 (+15.51%)

Mutual labels: jupyter-notebook, deep-neural-networks, attention-mechanism

Google2csv

Google2Csv a simple google scraper that saves the results on a csv/xlsx/jsonl file

Stars: ✭ 145 (-52.15%)

Mutual labels: google, jupyter-notebook

Dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Stars: ✭ 9,681 (+3095.05%)

Mutual labels: google, jupyter-notebook

Visual-Attention-Model

Chainer implementation of Deepmind's Visual Attention Model paper

Stars: ✭ 27 (-91.09%)

Mutual labels: chainer, attention-mechanism

Machine Learning Book

《机器学习宝典》包含：谷歌机器学习速成课程（招式）+机器学习术语表（口诀）+机器学习规则（心得）+机器学习中的常识性问题（内功）。该资源适用于机器学习、深度学习研究人员和爱好者参考！

Stars: ✭ 616 (+103.3%)

Mutual labels: google, jupyter-notebook

Medium Article

Repo for articles in my personal blog and Medium

Stars: ✭ 28 (-90.76%)

Mutual labels: google, jupyter-notebook

Bert Chainer

Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

Stars: ✭ 205 (-32.34%)

Mutual labels: google, chainer

Snap N Eat

Food detection and recommendation with deep learning

Stars: ✭ 229 (-24.42%)

Mutual labels: jupyter-notebook, deep-neural-networks

Da Rnn

📃 **Unofficial** PyTorch Implementation of DA-RNN (arXiv:1704.02971)

Stars: ✭ 256 (-15.51%)

Mutual labels: jupyter-notebook, attention-mechanism

Realtime object detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV. No Bugs No Worries. Enjoy!

Stars: ✭ 260 (-14.19%)

Mutual labels: google, deep-neural-networks

Dlpython course

Примеры для курса "Программирование глубоких нейронных сетей на Python"

Stars: ✭ 266 (-12.21%)

Mutual labels: jupyter-notebook, deep-neural-networks

Mixup Generator

An implementation of "mixup: Beyond Empirical Risk Minimization"

Stars: ✭ 250 (-17.49%)

Mutual labels: jupyter-notebook, deep-neural-networks

Mixture Density Networks For Distribution And Uncertainty Estimation

A generic Mixture Density Networks (MDN) implementation for distribution and uncertainty estimation by using Keras (TensorFlow)

Stars: ✭ 249 (-17.82%)

Mutual labels: jupyter-notebook, deep-neural-networks

Google It Automation

google it automation with python professional certificate

Stars: ✭ 81 (-73.27%)

Mutual labels: google, jupyter-notebook

Dlwpt Code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.

Stars: ✭ 3,054 (+907.92%)

Mutual labels: jupyter-notebook, deep-neural-networks

Covid19 mobility

COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉

Stars: ✭ 156 (-48.51%)

Mutual labels: google, jupyter-notebook

Dab

Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ

Stars: ✭ 294 (-2.97%)

Mutual labels: jupyter-notebook, deep-neural-networks

Cardio

CardIO is a library for data science research of heart signals

Stars: ✭ 218 (-28.05%)

Mutual labels: jupyter-notebook, deep-neural-networks

Triplet Attention

Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]

Stars: ✭ 222 (-26.73%)

Mutual labels: jupyter-notebook, attention-mechanism

Multi-task-Conditional-Attention-Networks

A prototype version of our submitted paper: Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives.

Stars: ✭ 21 (-93.07%)

Mutual labels: chainer, attention-mechanism

View All Similar Projects ➔

Transformer - Attention Is All You Need

Chainer-based Python implementation of Transformer, an attention-based seq2seq model without convolution and recurrence.
If you want to see the architecture, please see net.py.

See "Attention Is All You Need", Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017.

This repository is partly derived from my convolutional seq2seq repo, which is also derived from Chainer's official seq2seq example.

Requirement

Python 3.6.0+
Chainer 2.0.0+
numpy 1.12.1+
cupy 1.0.0+ (if using gpu)
nltk
progressbar
(You can install all through pip)
and their dependencies

Prepare Dataset

You can use any parallel corpus.
For example, run

sh download_wmt.sh

which downloads and decompresses training dataset and development dataset from WMT/europal into your current directory. These files and their paths are set in training script train.py as default.

How to Run

PYTHONIOENCODING=utf-8 python -u train.py -g=0 -i DATA_DIR -o SAVE_DIR

During training, logs for loss, perplexity, word accuracy and time are printed at a certain internval, in addition to validation tests (perplexity and BLEU for generation) every half epoch. And also, generation test is performed and printed for checking training progress.

Arguments

Some of them is as follows:

-g: your gpu id. If cpu, set -1.
-i DATA_DIR, -s SOURCE, -t TARGET, -svalid SVALID, -tvalid TVALID:
DATA_DIR directory needs to include a pair of training dataset SOURCE and TARGET with a pair of validation dataset SVALID and TVALID. Each pair should be parallell corpus with line-by-line sentence alignment.
-o SAVE_DIR: JSON log report file and a model snapshot will be saved in SAVE_DIR directory (if it does not exist, it will be automatically made).
-e: max epochs of training corpus.
-b: minibatch size.
-u: size of units and word embeddings.
-l: number of layers in both the encoder and the decoder.
--source-vocab: max size of vocabulary set of source language
--target-vocab: max size of vocabulary set of target language

Please see the others by python train.py -h.

Note

This repository does not aim for complete validation of results in the paper, so I have not eagerly confirmed validity of performance. But, I expect my implementation is almost compatible with a model described in the paper. Some differences where I am aware are as follows:

Optimization/training strategy. Detailed information about batchsize, parameter initialization, etc. is unclear in the paper. Additionally, the learning rate proposed in the paper may work only with a large batchsize (e.g. 4000) for deep layer nets. I changed warmup_step to 32000 from 4000, though there is room for improvement. I also changed relu into leaky relu in feedforward net layers for easy gradient propagation.
Vocabulary set, dataset, preprocessing and evaluation. This repo uses a common word-based tokenization, although the paper uses byte-pair encoding. Size of token set also differs. Evaluation (validation) is little unfair and incompatible with one in the paper, e.g., even validation set replaces unknown words to a single "unk" token.
Beam search is unused in BLEU calculation.
Model size. The setting of a model in this repo is one of "base model" in the paper, although you can modify some lines for using "big model".
This code follows some settings used in tensor2tensor repository, which includes a Transformer model. For example, positional encoding used in the repository seems to differ from one in the paper. This code follows the former one.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 303

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗