All Projects → cyk1337 → Highway-Transformer

cyk1337 / Highway-Transformer

Licence: Apache-2.0 license
[ACL‘20] Highway Transformer: A Gated Transformer.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Highway-Transformer

Bert Pytorch
Google AI 2018 BERT pytorch implementation
Stars: ✭ 4,642 (+17753.85%)
Mutual labels:  transformer, language-model
Vietnamese Electra
Electra pre-trained model using Vietnamese corpus
Stars: ✭ 55 (+111.54%)
Mutual labels:  transformer, language-model
Nlp Paper
NLP Paper
Stars: ✭ 484 (+1761.54%)
Mutual labels:  transformer, language-model
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+684.62%)
Mutual labels:  transformer, language-model
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+214292.31%)
Mutual labels:  transformer, language-model
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+1469.23%)
Mutual labels:  transformer, language-model
Gpt2 French
GPT-2 French demo | Démo française de GPT-2
Stars: ✭ 47 (+80.77%)
Mutual labels:  transformer, language-model
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (+134.62%)
Mutual labels:  transformer, language-model
Pytorch Openai Transformer Lm
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
Stars: ✭ 1,268 (+4776.92%)
Mutual labels:  transformer, language-model
Indonesian Language Models
Indonesian Language Models and its Usage
Stars: ✭ 64 (+146.15%)
Mutual labels:  transformer, language-model
Awesome Bert Nlp
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Stars: ✭ 567 (+2080.77%)
Mutual labels:  transformer, language-model
Gpt Scrolls
A collaborative collection of open-source safe GPT-3 prompts that work well
Stars: ✭ 195 (+650%)
Mutual labels:  transformer, language-model
Gpt2
PyTorch Implementation of OpenAI GPT-2
Stars: ✭ 64 (+146.15%)
Mutual labels:  transformer, language-model
Tupe
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Stars: ✭ 143 (+450%)
Mutual labels:  transformer, language-model
Relational Rnn Pytorch
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Stars: ✭ 236 (+807.69%)
Mutual labels:  transformer, language-model
svelte-jest
Jest Svelte component transformer
Stars: ✭ 37 (+42.31%)
Mutual labels:  transformer
ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine similarity metrics.Compatible with all BERT base transformers from huggingface.
Stars: ✭ 36 (+38.46%)
Mutual labels:  transformer
alpr utils
ALPR model in unconstrained scenarios for Chinese license plates
Stars: ✭ 158 (+507.69%)
Mutual labels:  transformer
transformer
A simple TensorFlow implementation of the Transformer
Stars: ✭ 25 (-3.85%)
Mutual labels:  transformer
proc-that
proc(ess)-that - easy extendable ETL tool for Node.js. Written in TypeScript.
Stars: ✭ 25 (-3.85%)
Mutual labels:  transformer

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

ACL2020 Highway Transformer GitHub

This repo is the demo code of Transformer-XL using Self-Dependency Unit. This work is closedly related to Gating-enhanced Transformer variants, such as Google's Switch Transformers.

Yekun Chai et. al., Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020)

Requirements

  • PyTorch >= 1.1.0
  • TensorboardX >= 1.8
  • Tensorboard >= 1.14
  • 4 GPUs of each 8GB memory for running 12 layer Transformer-XL

Data download

bash getdata.sh

Run 6-layer Transformer-XL

cd pytorch/xl_L6_scripts && bash <script-name>.sh train --work_dir "PATH_TO_WORK_DIR"

Visualizing Your Result

cd XL-L6-results && tensorboard --logdir=.

Results

  • Line plots of different model settings, where the topmost line (in red) is the baseline model (i.e., original Transformer-XL).
  • After adding Self-Dependency Unit (see bottom two curves), it is clear that Highway Transformer speeds up the convergence process during training and evaluation.
training bpc                                             training loss                                           
alt-Training-1 Training loss
eval bpc                                                    eval loss                                                  
eval BPC eval BPC

Citation

For attribution in academic contexts, please cite this work as:

@inproceedings{chai-etal-2020-highway,
    title = "Highway Transformer: Self-Gating Enhanced Self-Attentive Networks",
    author = "Chai, Yekun  and
      Jin, Shuo  and
      Hou, Xinwen",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.616",
    pages = "6887--6900"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].