Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cyk1337 → Highway-Transformer

cyk1337 / Highway-Transformer

Licence: Apache-2.0 license

[ACL‘20] Highway Transformer: A Gated Transformer.

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

pytorch transformer language-model gated-attention transformer-xl highway-transformer gating-transformer

Projects that are alternatives of or similar to Highway-Transformer

Google AI 2018 BERT pytorch implementation

Stars: ✭ 4,642 (+17753.85%)

Mutual labels: transformer, language-model

Vietnamese Electra

Electra pre-trained model using Vietnamese corpus

Stars: ✭ 55 (+111.54%)

Mutual labels: transformer, language-model

NLP Paper

Stars: ✭ 484 (+1761.54%)

Mutual labels: transformer, language-model

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (+684.62%)

Mutual labels: transformer, language-model

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+214292.31%)

Mutual labels: transformer, language-model

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+1469.23%)

Mutual labels: transformer, language-model

GPT-2 French demo | Démo française de GPT-2

Stars: ✭ 47 (+80.77%)

Mutual labels: transformer, language-model

MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems

Stars: ✭ 61 (+134.62%)

Mutual labels: transformer, language-model

Pytorch Openai Transformer Lm

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Stars: ✭ 1,268 (+4776.92%)

Mutual labels: transformer, language-model

Indonesian Language Models

Indonesian Language Models and its Usage

Stars: ✭ 64 (+146.15%)

Mutual labels: transformer, language-model

Awesome Bert Nlp

A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.

Stars: ✭ 567 (+2080.77%)

Mutual labels: transformer, language-model

A collaborative collection of open-source safe GPT-3 prompts that work well

Stars: ✭ 195 (+650%)

Mutual labels: transformer, language-model

PyTorch Implementation of OpenAI GPT-2

Stars: ✭ 64 (+146.15%)

Mutual labels: transformer, language-model

Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.

Stars: ✭ 143 (+450%)

Mutual labels: transformer, language-model

Relational Rnn Pytorch

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

Stars: ✭ 236 (+807.69%)

Mutual labels: transformer, language-model

Jest Svelte component transformer

Stars: ✭ 37 (+42.31%)

Mutual labels: transformer

ClusterTransformer

Topic clustering library built on Transformer embeddings and cosine similarity metrics.Compatible with all BERT base transformers from huggingface.

Stars: ✭ 36 (+38.46%)

Mutual labels: transformer

ALPR model in unconstrained scenarios for Chinese license plates

Stars: ✭ 158 (+507.69%)

Mutual labels: transformer

A simple TensorFlow implementation of the Transformer

Stars: ✭ 25 (-3.85%)

Mutual labels: transformer

proc(ess)-that - easy extendable ETL tool for Node.js. Written in TypeScript.

Stars: ✭ 25 (-3.85%)

Mutual labels: transformer

View All Similar Projects ➔

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

This repo is the demo code of Transformer-XL using Self-Dependency Unit. This work is closedly related to Gating-enhanced Transformer variants, such as Google's Switch Transformers.

Yekun Chai et. al., Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020)

Requirements

PyTorch >= 1.1.0
TensorboardX >= 1.8
Tensorboard >= 1.14
4 GPUs of each 8GB memory for running 12 layer Transformer-XL

Data download

bash getdata.sh

Run 6-layer Transformer-XL

cd pytorch/xl_L6_scripts && bash <script-name>.sh train --work_dir "PATH_TO_WORK_DIR"

Visualizing Your Result

cd XL-L6-results && tensorboard --logdir=.

Results

Line plots of different model settings, where the topmost line (in red) is the baseline model (i.e., original Transformer-XL).
After adding Self-Dependency Unit (see bottom two curves), it is clear that Highway Transformer speeds up the convergence process during training and evaluation.

training bpc	training loss

eval bpc	eval loss

Citation

For attribution in academic contexts, please cite this work as:

@inproceedings{chai-etal-2020-highway,
    title = "Highway Transformer: Self-Gating Enhanced Self-Attentive Networks",
    author = "Chai, Yekun  and
      Jin, Shuo  and
      Hou, Xinwen",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.616",
    pages = "6887--6900"
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 26

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗