All Projects → bzhangGo → lrn

bzhangGo / lrn

Licence: BSD-3-Clause license
Source code for "A Lightweight Recurrent Network for Sequence Modeling"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to lrn

DSKG
No description or website provided.
Stars: ✭ 65 (+195.45%)
Mutual labels:  recurrent-neural-network
off-policy-continuous-control
[DeepRL Workshop, NeurIPS-21] Recurrent Off-policy Baselines for Memory-based Continuous Control (RDPG, RTD3 and RSAC)
Stars: ✭ 29 (+31.82%)
Mutual labels:  recurrent-neural-network
char-rnnlm-tensorflow
Char RNN Language Model based on Tensorflow
Stars: ✭ 14 (-36.36%)
Mutual labels:  recurrent-neural-network
deep-pmsm
Estimate intrinsic Permanent Magnet Synchronous Motor temperatures with deep recurrent and convolutional neural networks.
Stars: ✭ 29 (+31.82%)
Mutual labels:  recurrent-neural-network
char-rnn-text-generation
Character Embeddings Recurrent Neural Network Text Generation Models
Stars: ✭ 64 (+190.91%)
Mutual labels:  recurrent-neural-network
PolyphonicPianoTranscription
Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)
Stars: ✭ 146 (+563.64%)
Mutual labels:  recurrent-neural-network
deep-explanation-penalization
Code for using CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge" https://arxiv.org/abs/1909.13584
Stars: ✭ 110 (+400%)
Mutual labels:  recurrent-neural-network
Natural-Language-Processing
Contains various architectures and novel paper implementations for Natural Language Processing tasks like Sequence Modelling and Neural Machine Translation.
Stars: ✭ 48 (+118.18%)
Mutual labels:  sequence-modeling

lrn

Source code for "A Lightweight Recurrent Network for Sequence Modeling"

Model Architecture

In our new paper, we propose lightweight recurrent network, which combines the strengths of ATR and SRU.

  • ATR helps reduces model parameters and avoids additional free parameters for gate calculation, through the twin-gate mechanism
  • SRU follows the QRNN and moves all recurrent computations outside the recurrence.

Based on the above units, we propose LRN:

where g(·) is an activation function, tanh or identity. Wq, Wk and Wv are model parameters. The matrix computation (as well as potential layer noramlization) can be shfited outside the recurrence. Therefore, the whole model is fast in running.

When applying twin-gate mechanism, the output value in ht might suffer explosion issue, which could grow into infinity. This is the reason we added the activation function. Another alternative solution would be using layer normalization, which forces activation values to be stable.

Structure Analysis

One way to understand the model is to unfold the LRN structure along input tokens:

The above structure which is also observed by Zhang et al., Lee et al., and etc, endows the RNN model with multiple interpretations. We provide two as follows:

  • Relation with Self Attention Networks

Informally, LRN assembles forget gates from step t to step k+1 in order to query the key (input gate). The result weight is assigned to the corresponding value representation and contributes to the final hidden representation.

Does the learned weights make sense? We do a classification tasks on AmaPolar task with a unidirectional linear-LRN. The final hidden state is feed into the classifier. One example below shows the learned weights. The term great gains a large weight, which decays slowly and contributes the final positive decision.

  • Long-term and Short-term Memory

Another view of the unfolded structure is that different gates form different memory mechanism. The input gate acts as a short-term memory and indicates how many information can be activated in this token. The forget gates form a forget chain that controls how to erase meaningless past information.

Experiments

We did experiment on six different tasks:

Citation

Please cite the following paper:

Biao Zhang; Rico Sennrich (2019). A Lightweight Recurrent Network for Sequence Modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy.

@inproceedings{zhang-sennrich:2019:ACL,
  address = "Florence, Italy",
  author = "Zhang, Biao and Sennrich, Rico",
  booktitle = "{Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics}",
  publisher = "Association for Computational Linguistics",
  title = "{A Lightweight Recurrent Network for Sequence Modeling}",
  year = "2019"
}

Contact

For any further comments or questions about LRN, please email Biao Zhang.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].