All Projects → soskek → efficient_softmax

soskek / efficient_softmax

Licence: other
BlackOut and Adaptive Softmax for language models by Chainer

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to efficient softmax

chainer-param-monitor
Monitor parameter and gradient statistics during neural network training with Chainer
Stars: ✭ 13 (+8.33%)
Mutual labels:  chainer
chainer-ClariNet
A Chainer implementation of ClariNet.
Stars: ✭ 45 (+275%)
Mutual labels:  chainer
chainer-Fast-WaveNet
A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).
Stars: ✭ 33 (+175%)
Mutual labels:  chainer
chainer-sort
Simple, Online, Realtime Tracking of Multiple Objects (SORT) implementation for Chainer and ChainerCV.
Stars: ✭ 20 (+66.67%)
Mutual labels:  chainer
tutorials
Introduction to Deep Learning: Chainer Tutorials
Stars: ✭ 68 (+466.67%)
Mutual labels:  chainer
Deep-Learning-Mahjong---
Reinforcement learning (RL) implementation of imperfect information game Mahjong using markov decision processes to predict future game states
Stars: ✭ 45 (+275%)
Mutual labels:  softmax
chainer-fcis
[This project has moved to ChainerCV] Chainer Implementation of Fully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 45 (+275%)
Mutual labels:  chainer
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+125%)
Mutual labels:  chainer
Multi-task-Conditional-Attention-Networks
A prototype version of our submitted paper: Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creatives.
Stars: ✭ 21 (+75%)
Mutual labels:  chainer
deep-learning-platforms
deep-learning platforms,framework,data(深度学习平台、框架、资料)
Stars: ✭ 17 (+41.67%)
Mutual labels:  chainer
chainer-graph-cnn
Chainer implementation of 'Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering' (https://arxiv.org/abs/1606.09375)
Stars: ✭ 67 (+458.33%)
Mutual labels:  chainer
chainer-ResDrop
Deep Networks with Stochastic Depth implementation by Chainer
Stars: ✭ 40 (+233.33%)
Mutual labels:  chainer
chainer-LSGAN
Least Squares Generative Adversarial Network implemented in Chainer
Stars: ✭ 16 (+33.33%)
Mutual labels:  chainer
rocgan
Chainer implementation of the paper Robust Conditional Generative Adversarial Networks
Stars: ✭ 15 (+25%)
Mutual labels:  chainer
chainer-notebooks
Jupyter notebooks for Chainer hands-on
Stars: ✭ 23 (+91.67%)
Mutual labels:  chainer
char-rnnlm-tensorflow
Char RNN Language Model based on Tensorflow
Stars: ✭ 14 (+16.67%)
Mutual labels:  rnn-language-model
sp2cp
Imageboard bot with recurrent neural network (RNN, GRU)
Stars: ✭ 23 (+91.67%)
Mutual labels:  rnn-language-model
kaggle-champs-scalar-coupling
19th place solution in "Predicting Molecular Properties"
Stars: ✭ 26 (+116.67%)
Mutual labels:  chainer
NCE-loss
Tensorflow NCE loss in Keras
Stars: ✭ 30 (+150%)
Mutual labels:  softmax
char-rnn-text-generation
Character Embeddings Recurrent Neural Network Text Generation Models
Stars: ✭ 64 (+433.33%)
Mutual labels:  chainer

Efficient Softmax Approximation

Implementations of Blackout and Adaptive Softmax for efficiently calculating word distribution for language modeling of very large vocabularies.

LSTM language models are derived from rnnlm_chainer.

Available output layers are as follows

  • Linear + softmax with cross entropy loss. A usual output layer.
  • --share-embedding: A variant using the word embedding matrix shared with the input layer for the output layer.
  • --adaptive-softmax: Adaptive softmax
  • --blackout: BlackOut (BlackOut is not faster on GPU.)

Adaptive Softmax

  • Efficient softmax approximation for GPUs
  • Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, ICML 2017
  • paper
  • authors' Lua code

BlackOut

  • BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
  • Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey, ICLR 2016
  • paper
  • authors' C++ code

How to Run

python -u train.py -g 0

Datasets

  • PennTreeBank
  • Wikitext-2
  • Wikitext-103

For wikitext, run prepare_wikitext.sh for downloading the datasets.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].