All Projects → lucidrains → h-transformer-1d

lucidrains / h-transformer-1d

Licence: MIT license
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to h-transformer-1d

visualization
a collection of visualization function
Stars: ✭ 189 (+56.2%)
Mutual labels:  transformer, attention, attention-mechanism
Self Attention Cv
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Stars: ✭ 209 (+72.73%)
Mutual labels:  transformer, attention, attention-mechanism
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+237.19%)
Mutual labels:  transformer, attention, attention-mechanism
CrabNet
Predict materials properties using only the composition information!
Stars: ✭ 57 (-52.89%)
Mutual labels:  transformer, attention, attention-mechanism
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+239.67%)
Mutual labels:  transformer, attention, attention-mechanism
Medical Transformer
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Stars: ✭ 153 (+26.45%)
Mutual labels:  transformer, attention
Eeg Dl
A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.
Stars: ✭ 165 (+36.36%)
Mutual labels:  transformer, attention-mechanism
Graphtransformer
Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.
Stars: ✭ 187 (+54.55%)
Mutual labels:  transformer, attention
Jddc solution 4th
2018-JDDC大赛第4名的解决方案
Stars: ✭ 235 (+94.21%)
Mutual labels:  transformer, attention
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (-7.44%)
Mutual labels:  transformer, attention
Linear Attention Transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Stars: ✭ 205 (+69.42%)
Mutual labels:  transformer, attention-mechanism
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+2724.79%)
Mutual labels:  transformer, attention
Routing Transformer
Fully featured implementation of Routing Transformer
Stars: ✭ 149 (+23.14%)
Mutual labels:  transformer, attention-mechanism
Transformer In Generating Dialogue
An Implementation of 'Attention is all you need' with Chinese Corpus
Stars: ✭ 121 (+0%)
Mutual labels:  transformer, attention-mechanism
Transformers.jl
Julia Implementation of Transformer models
Stars: ✭ 173 (+42.98%)
Mutual labels:  transformer, attention
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (-4.13%)
Mutual labels:  transformer, attention
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-59.5%)
Mutual labels:  transformer, attention
seq2seq-pytorch
Sequence to Sequence Models in PyTorch
Stars: ✭ 41 (-66.12%)
Mutual labels:  transformer, attention
Transformers-RL
An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"
Stars: ✭ 107 (-11.57%)
Mutual labels:  transformer, attention-mechanism
Im2LaTeX
An implementation of the Show, Attend and Tell paper in Tensorflow, for the OpenAI Im2LaTeX suggested problem
Stars: ✭ 16 (-86.78%)
Mutual labels:  attention, attention-mechanism

H-Transformer-1D

Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. The encoder (non-autoregressive) flavor of this architecture currently holds the throne for Long Range Arena, a benchmark for efficient transformers.

Open In Colab 131k tokens

Install

$ pip install h-transformer-1d

Usage

import torch
from h_transformer_1d import HTransformer1D

model = HTransformer1D(
    num_tokens = 256,          # number of tokens
    dim = 512,                 # dimension
    depth = 12,                # depth
    causal = False,            # autoregressive or not
    max_seq_len = 8192,        # maximum sequence length
    heads = 8,                 # heads
    dim_head = 64,             # dimension per head
    block_size = 128,          # block size
    reversible = True,         # use reversibility, to save on memory with increased depth
    shift_tokens = True        # whether to shift half the feature space by one along the sequence dimension, for faster convergence (experimental feature)
)

x = torch.randint(0, 256, (1, 8000))   # variable sequence length
mask = torch.ones((1, 8000)).bool()    # variable mask length

# network will automatically pad to power of 2, do hierarchical attention, etc

logits = model(x, mask = mask) # (1, 8000, 256)

Citations

@misc{zhu2021htransformer1d,
    title   = {H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences}, 
    author  = {Zhenhai Zhu and Radu Soricut},
    year    = {2021},
    eprint  = {2107.11906},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}
@software{peng_bo_2021_5196578,
    author       = {PENG Bo},
    title        = {BlinkDL/RWKV-LM: 0.01},
    month        = {aug},
    year         = {2021},
    publisher    = {Zenodo},
    version      = {0.01},
    doi          = {10.5281/zenodo.5196578},
    url          = {https://doi.org/10.5281/zenodo.5196578}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].