All Projects → jiali-ms → Jlm

jiali-ms / Jlm

Licence: mit
A fast LSTM Language Model for large vocabulary language like Japanese and Chinese

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Jlm

Easy Deep Learning With Keras
Keras tutorial for beginners (using TF backend)
Stars: ✭ 367 (+249.52%)
Mutual labels:  deep-neural-networks, lstm
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+658.1%)
Mutual labels:  deep-neural-networks, lstm
Flow Forecast
Deep learning PyTorch library for time series forecasting, classification, and anomaly detection (originally for flood forecasting).
Stars: ✭ 368 (+250.48%)
Mutual labels:  deep-neural-networks, lstm
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (+51.43%)
Mutual labels:  beam-search, lstm
Gdax Orderbook Ml
Application of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-42.86%)
Mutual labels:  deep-neural-networks, lstm
Image Captioning
Image Captioning using InceptionV3 and beam search
Stars: ✭ 290 (+176.19%)
Mutual labels:  beam-search, lstm
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (+406.67%)
Mutual labels:  deep-neural-networks, lstm
Speech Emotion Recognition
Speaker independent emotion recognition
Stars: ✭ 169 (+60.95%)
Mutual labels:  deep-neural-networks, lstm
Time Attention
Implementation of RNN for Time Series prediction from the paper https://arxiv.org/abs/1704.02971
Stars: ✭ 52 (-50.48%)
Mutual labels:  deep-neural-networks, lstm
Deepseqslam
The Official Deep Learning Framework for Route-based Place Recognition
Stars: ✭ 49 (-53.33%)
Mutual labels:  deep-neural-networks, lstm
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (+20%)
Mutual labels:  beam-search, lstm
Bitcoin Price Prediction Using Lstm
Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network
Stars: ✭ 67 (-36.19%)
Mutual labels:  deep-neural-networks, lstm
Video Classification Cnn And Lstm
To classify video into various classes using keras library with tensorflow as back-end.
Stars: ✭ 218 (+107.62%)
Mutual labels:  deep-neural-networks, lstm
Predictive Maintenance Using Lstm
Example of Multiple Multivariate Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras.
Stars: ✭ 352 (+235.24%)
Mutual labels:  deep-neural-networks, lstm
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (+92.38%)
Mutual labels:  deep-neural-networks, lstm
Ctcdecode
PyTorch CTC Decoder bindings
Stars: ✭ 442 (+320.95%)
Mutual labels:  beam-search, decoder
Pytorch convlstm
convolutional lstm implementation in pytorch
Stars: ✭ 126 (+20%)
Mutual labels:  deep-neural-networks, lstm
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1897.14%)
Mutual labels:  deep-neural-networks, lstm
Sangita
A Natural Language Toolkit for Indian Languages
Stars: ✭ 43 (-59.05%)
Mutual labels:  deep-neural-networks, lstm
Irm Based Speech Enhancement Using Lstm
Ideal Ratio Mask (IRM) Estimation based Speech Enhancement using LSTM
Stars: ✭ 66 (-37.14%)
Mutual labels:  deep-neural-networks, lstm

JLM

A fast LSTM Language Model for large vocabulary language like Japanese and Chinese.

Faster and smaller without accuracy loss

It focuses on accelerating inference time and reducing model size to fit requirement of real-time applications especially in client side. It is 85% smaller, and are 50x faster than standard LSTM solution with softmax. See the paper JLM - Fast RNN Language Model with Large Vocabulary for performance detail.

No dependency to training framework

The training part is done with TensorFlow. Instead of depending on a big dynamic library of TF to run in client app, we dumped the trained weights out. The inference and decoding can be done by python with numpy or C++ with Eigen. There is no black box.

Language model

We implemented the standard LSTM , tie-embedding, D-softmax, and D-softmax* in both training and numpy inference stages as a comparison. In practice, please consider just use D-softmax* for your best interest.

Decoder

A standard Viterbi decoder with beam search is implemented. It batches prediction to save decoding time. We also implemented Enabling Real-time Neural IME with Incremental Vocabulary Selection at NAACL 2019. It further more reduced the softmax cost during decoding by ~95%, reaching real-time in commodity CPU.

How to use

Corpus preparation

We use BCCWJ Japanese corpus for example.

Make sure you have a txt file with a white space segmented sentence per line.

If not, use tools like MeCab to get the job done first.

In JLM, the unit word is the in the format of display/reading/POS. For example "品川/シナガワ/名詞-固有名詞-地名-一般".

Run data preparation script

python data.py -v 50000

Make sure you put your corpus.txt in the data folder first. The script will generate a encoded corpus and a bunch of pickles for later usage. The lexicon is also generated in this stage. Words in lexicon are ordered by their frequency for the convenience of vocabulary segmentation.

Run training script

  • Config the experiment folder path (put the corpus with vocab you would like to use in the config)

data_path = os.path.abspath(os.path.join(root_path, "data/corpus_50000"))

  • Enter the train folder, and edit the run file train.py file
parameters = {
    "is_debug": False,
    "batch_size": 128 * 3,
    "embed_size": 256,
    "hidden_size": 512,
    "num_steps": 20,
    "max_epochs": 10,
    "early_stopping": 1,
    "dropout": 0.9,
    "lr": 0.001,
    "share_embedding": True,
    "gpu_id": 0,
    "tf_random_seed": 101,
    "D_softmax": False,
    "V_table": True,
    "embedding_seg": [(200, 0, 12000), (100, 12000, 30000), (50, 30000, None)]
}

V_table is the D-softmax* here. The embedding_seg means how you want to separate your vocabulary. Make your own best result by tuning the hyper parameters. The results will be saved to folder "experiments" with a ID, sacred framework will take care of all the experiment indexing.

A small hint here, use up your GPU memory by setting a bigger num_steps and batch_size to best reduce training time.

Verify LM is correct

Run the test.py in train folder, it will auto generate a random sentence with the trained model. If the sentence doesn't make sense to your language knowledge at all. One of the stage is not correctly setup. Here is an example of random generated sentence.

自宅 に テレビ 局 で 電話 し たら 、 で も 、 すべて の 通話 が 実現 。

Dump the TF trained weights

It is a common issue to bring trained model into clients like mobile device or Windows environment. We recommend you use ONNX for the purpose. But for this example, we want full control and want to cutoff dependency to TF. Run the weights.py in the train folder to get a pickle of all the trainable parameters in the numpy format. You can use the following command to even dump the txt format for other usage.

python weiths.py -e 1 -v True

Evaluation

Run eval.py to see the conversion accuracy of your trained model with offline model.

python eval.py

Compression

We also implemented the k-means quantization mentioned in the ICLR 2016 paper Deep Compression. The code is at comp.py. Run the script with the experiment id. It will generate the code and codebook. By our experiments with 8 bits, there is almost no accuracy loss for conversion task.

Want to try more examples of the model?

go to http://nlpfun.com/ime

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].