All Projects → IsaacChanghau → Dense_BiLSTM

IsaacChanghau / Dense_BiLSTM

Licence: MIT License
Tensorflow Implementation of Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Dense BiLSTM

sequence-rnn-py
Sequence analyzing using Recurrent Neural Networks (RNN) based on Keras
Stars: ✭ 28 (-41.67%)
Mutual labels:  lstm, rnn
deep-improvisation
Easy-to-use Deep LSTM Neural Network to generate song sounds like containing improvisation.
Stars: ✭ 53 (+10.42%)
Mutual labels:  lstm, rnn
lstm-electric-load-forecast
Electric load forecast using Long-Short-Term-Memory (LSTM) recurrent neural network
Stars: ✭ 56 (+16.67%)
Mutual labels:  lstm, rnn
medical-diagnosis-cnn-rnn-rcnn
分别使用rnn/cnn/rcnn来实现根据患者描述,进行疾病诊断
Stars: ✭ 39 (-18.75%)
Mutual labels:  lstm, rnn
Base-On-Relation-Method-Extract-News-DA-RNN-Model-For-Stock-Prediction--Pytorch
基於關聯式新聞提取方法之雙階段注意力機制模型用於股票預測
Stars: ✭ 33 (-31.25%)
Mutual labels:  lstm, rnn
air writing
Online Hand Writing Recognition using BLSTM
Stars: ✭ 26 (-45.83%)
Mutual labels:  lstm, rnn
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (-56.25%)
Mutual labels:  lstm, rnn
ArrayLSTM
GPU/CPU (CUDA) Implementation of "Recurrent Memory Array Structures", Simple RNN, LSTM, Array LSTM..
Stars: ✭ 21 (-56.25%)
Mutual labels:  lstm, rnn
myDL
Deep Learning
Stars: ✭ 18 (-62.5%)
Mutual labels:  lstm, rnn
ConvLSTM-PyTorch
ConvLSTM/ConvGRU (Encoder-Decoder) with PyTorch on Moving-MNIST
Stars: ✭ 202 (+320.83%)
Mutual labels:  lstm, rnn
novel writer
Train LSTM to writer novel (HongLouMeng here) in Pytorch.
Stars: ✭ 14 (-70.83%)
Mutual labels:  lstm, rnn
rnn2d
CPU and GPU implementations of some 2D RNN layers
Stars: ✭ 26 (-45.83%)
Mutual labels:  lstm, rnn
Customer-Feedback-Analysis
Multi Class Text (Feedback) Classification using CNN, GRU Network and pre trained Word2Vec embedding, word embeddings on TensorFlow.
Stars: ✭ 18 (-62.5%)
Mutual labels:  rnn, sentence-classification
SpeakerDiarization RNN CNN LSTM
Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channels).
Stars: ✭ 56 (+16.67%)
Mutual labels:  lstm, rnn
Paper-Implementation-DSTP-RNN-For-Stock-Prediction-Based-On-DA-RNN
基於DA-RNN之DSTP-RNN論文試做(Ver1.0)
Stars: ✭ 62 (+29.17%)
Mutual labels:  lstm, rnn
theano-recurrence
Recurrent Neural Networks (RNN, GRU, LSTM) and their Bidirectional versions (BiRNN, BiGRU, BiLSTM) for word & character level language modelling in Theano
Stars: ✭ 40 (-16.67%)
Mutual labels:  lstm, rnn
5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
RNN-LSTM that learns passwords from a starting list
Stars: ✭ 35 (-27.08%)
Mutual labels:  lstm, rnn
EBIM-NLI
Enhanced BiLSTM Inference Model for Natural Language Inference
Stars: ✭ 24 (-50%)
Mutual labels:  lstm, rnn
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (-10.42%)
Mutual labels:  lstm, rnn
Sequence-Models-coursera
Sequence Models by Andrew Ng on Coursera. Programming Assignments and Quiz Solutions.
Stars: ✭ 53 (+10.42%)
Mutual labels:  lstm, rnn

Densely Connected Bidirectional LSTM

Authour
Tensorflow implementation of Densely Connected Bidirectional LSTM with Applications to Sentence Classification, [arXiv:1802.00889].

Densely Connected Bidirectional LSTM (DC-Bi-LSTM) Overview

model_graph_1

The architecture of DC-Bi-LSTM. The first-layer reading memory is obtained based on original input sequence, and second-layer reading memory based on the position-aligned concatenation of original input sequence and first-layer reading memory, and so on. Finally, get the n-th-layer reading memory and take it as the final feature representation for classification.

model_graph_2

Illustration of (a) Deep Stacked Bi-LSTM and (b) DC-Bi-LSTM. Each black node denotes an input layer. Purple, green, and yellow nodes denote hidden layers. Orange nodes denote average pooling of forward or backward hidden layers. Each red node denotes a class. Ellipse represents the concatenation of its internal nodes. Solid lines denote the connections of two layers. Finally, dotted lines indicate the operation of copying.

Dataset Overview

More details of datasets are shown here: [dataset/raw/README.md]

Dataset Classes Average sentence length Dataset size Vocab size Number of words present in word2vec Test size
MR 2 20 10662 18765 16448 CV
SST1 5 18 11855 17836 16262 2210
SST2 2 19 9613 16185 14838 1821
Subj 2 23 10000 21323 17913 CV
TREC 6 10 5952 9592 9125 500
CR 2 19 3775 5340 5046 CV
MPQA 2 3 10606 6246 6083 CV

CV means cross validation

Usage

Configuration: all parameters and configurations are stored in models/config.py.
The first step is to prepare the required data (pre-trained word embeddings and raw datasets). The raw datasets are already included in this repository, which are located at dataset/raw/, word embeddings used in the paper, the 300-dimensional Glove vectors that were trained on 42 billion words, can be obtained by

$ cd dataset
$ ./download_emb.sh

After downloading the pre-trained word embeddings, run following to build training, development and testing dataset among all raw datasets, the built datasets will be stored in dataset/data/ directory.

$ cd dataset
$ python3 prepro.py

Then training model on a specific dataset via

$ python3 train_model.py --task <str> --resume_training <bool> --has_devset <bool>
# eg:
$ python3 train_model.py --task subj --resume_training True --has_devset False

If everything goes properly, then the training process will be launched

...
word embedding shape: [None, None, 350]
dense bi-lstm outputs shape: [None, None, 200]
average pooling outputs shape: [None, 200]
logits shape: [None, 2]
params number: 1443400
No checkpoint found in directory ./ckpt/subj/, cannot resume training. Do you want to start a new training session?
(y)es | (n)o: y
Start training...
Epoch  1/30:
45/45 [==============================] - 567s - train loss: 0.5043     
Testing model over TEST dataset: accuracy - 91.100
 -- new BEST score on TEST dataset: 91.100
...
Epoch  4/30:
45/45 [==============================] - 519s - train loss: 0.1998     
Testing model over TEST dataset: accuracy - 94.200
 -- new BEST score on TEST dataset: 94.200
Epoch  6/30:
45/45 [==============================] - 505s - train loss: 0.1534     
Testing model over TEST dataset: accuracy - 94.500
 -- new BEST score on TEST dataset: 94.500
Epoch  7/30:
45/45 [==============================] - 530s - train loss: 0.1415     
Testing model over TEST dataset: accuracy - 94.000
...

Results

Here only test the model on several datasets with some epochs to validate if the model works properly.

experiments on MacBook Pro (13-inch, 2017) with 3.1GHz Intel Core i5 CPU and 16GB 2133MHz LPDDR3 RAM

Dataset Train Epochs Batch Size Dev Test
MR 11 (w/o cross-validation) 200 N.A. 82.4
SST1 20 200 50.9 51.2
SST2 13 200 84.7 88.1
Subj 6 (w/o cross-validation) 200 N.A. 94.5
TREC 15 128 N.A. 94.7
CR 10 (w/o cross-validation) 64 N.A. 82.9
MPQA 5 (w/o cross-validation) 64 N.A. 87.3

the evaluation results on some datasets are slightly lower than those proposed in the paper may caused by data processing, different parameter settings (like, batch size, learning rate, learning rate decay, grad clip, char emb and etc.) or without applying cross-validation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].