All Projects → butsugiri → Chainer Rnn Ner

butsugiri / Chainer Rnn Ner

Named Entity Recognition with RNN, implemented by Chainer

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chainer Rnn Ner

DSTC6-End-to-End-Conversation-Modeling
DSTC6: End-to-End Conversation Modeling Track
Stars: ✭ 56 (+194.74%)
Mutual labels:  chainer, lstm
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+4089.47%)
Mutual labels:  lstm, recurrent-neural-networks
CS231n
PyTorch/Tensorflow solutions for Stanford's CS231n: "CNNs for Visual Recognition"
Stars: ✭ 47 (+147.37%)
Mutual labels:  recurrent-neural-networks, lstm
Machine Learning Curriculum
💻 Make machines learn so that you don't have to struggle to program them; The ultimate list
Stars: ✭ 761 (+3905.26%)
Mutual labels:  recurrent-neural-networks, chainer
Rnnsharp
RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.
Stars: ✭ 277 (+1357.89%)
Mutual labels:  lstm, recurrent-neural-networks
sequence-rnn-py
Sequence analyzing using Recurrent Neural Networks (RNN) based on Keras
Stars: ✭ 28 (+47.37%)
Mutual labels:  recurrent-neural-networks, lstm
tiny-rnn
Lightweight C++11 library for building deep recurrent neural networks
Stars: ✭ 41 (+115.79%)
Mutual labels:  recurrent-neural-networks, lstm
LSTM-Time-Series-Analysis
Using LSTM network for time series forecasting
Stars: ✭ 41 (+115.79%)
Mutual labels:  recurrent-neural-networks, lstm
Lstm Human Activity Recognition
Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier
Stars: ✭ 2,943 (+15389.47%)
Mutual labels:  lstm, recurrent-neural-networks
Carrot
🥕 Evolutionary Neural Networks in JavaScript
Stars: ✭ 261 (+1273.68%)
Mutual labels:  lstm, recurrent-neural-networks
SpeakerDiarization RNN CNN LSTM
Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channels).
Stars: ✭ 56 (+194.74%)
Mutual labels:  recurrent-neural-networks, lstm
Tensorflow Lstm Regression
Sequence prediction using recurrent neural networks(LSTM) with TensorFlow
Stars: ✭ 433 (+2178.95%)
Mutual labels:  lstm, recurrent-neural-networks
keras-malicious-url-detector
Malicious URL detector using keras recurrent networks and scikit-learn classifiers
Stars: ✭ 24 (+26.32%)
Mutual labels:  recurrent-neural-networks, lstm
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (+126.32%)
Mutual labels:  recurrent-neural-networks, lstm
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (+5.26%)
Mutual labels:  recurrent-neural-networks, lstm
dts
A Keras library for multi-step time-series forecasting.
Stars: ✭ 130 (+584.21%)
Mutual labels:  recurrent-neural-networks, lstm
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+16789.47%)
Mutual labels:  lstm, recurrent-neural-networks
Visual-Attention-Model
Chainer implementation of Deepmind's Visual Attention Model paper
Stars: ✭ 27 (+42.11%)
Mutual labels:  chainer, recurrent-neural-networks
sgrnn
Tensorflow implementation of Synthetic Gradient for RNN (LSTM)
Stars: ✭ 40 (+110.53%)
Mutual labels:  recurrent-neural-networks, lstm
Keras Anomaly Detection
Anomaly detection implemented in Keras
Stars: ✭ 335 (+1663.16%)
Mutual labels:  lstm, recurrent-neural-networks

Note: This repository is part of the assignment given in Tohoku University - Information Communication Theory (情報伝達学) lecture.

Students were actually expected to do some feature engineering with CRFsuite but I personally preferred implementing RNN.

About

This is the implementation of Named Entitty Recognition (NER) model based on Recurrent Neural Network (RNN). The model is heavily inspired by following papers:

  • Chiu, Jason PC, and Eric Nichols. "Named entity recognition with bidirectional LSTM-CNNs." Transactions of the Association for Computational Linguistics 4 (2016): 357-370.
  • James Hammerton. "Named Entity Recognition with Long Short-Term Memory." CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 Pages 172-175
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami and Chris Dyer. "Neural Architectures for Named Entity Recognition." Proceedings of NAACL-HLT 2016, pages 260–270

Note that this repo is not re-implementation of these models.

The purpose of implementing models is to see how the performance improves when I complicate model architectures (e.g. LSTM --> Bidirectional LSTM --> Bidirectional LSTM with Character-Encoding)

I suppose that this model can be applied to other (sequential labeling) tasks, but I have not yet tried.

Model Details

Following models are implemented by Chainer.

Models with Cross Entropy as Loss Function

  • LSTM (Model.py/NERTagger)
  • Bi-directional LSTM (Model.py/BiNERTagger)
  • Bi-directional LSTM with Character-based encoding (Model.py/BiCharNERTagger)

Models with CRF Layer as Loss Function

This loss function is much better than simple cross entropy as it (latently) considers the restriction given to BIO tags.

  • LSTM (CRFModel.py/CRFNERTagger)
  • Bi-directional LSTM (CRFModel.py/CRFBiNERTagger)
  • Bi-directional LSTM with Character-based encoding (CRFModel.py/CRFBiCharNERTagger)

Requirements

Software

  • Python 3.*
  • Chainer 1.19 (or higher)

Resources

  • Pretrained Word Vector (e.g. GloVe)
    • The script will still work (and learn) without this, but the performance will significantly deteriorate. (Read papers for details)
  • CoNLL 2003 Dataset

Usage

Preprocessing

Place CoNLL datasets train dev test in data/ and run preprocess.sh. This converts raw datasets into model-readable json format.

Then run generate_vocab.py and generate_char_vocab.py to generate vocabulary files.

Training

  • Training the model without CRF layer: train_model.py
  • Training the model with CRF layer: train_crf_model.py

Both scripts have exact same options:

  usage: train_model.py [-h] [--batchsize BATCHSIZE] [--epoch EPOCH] [--gpu GPU]
                        [--out OUT] [--resume RESUME] [--test] [--unit UNIT]
                        [--glove GLOVE] [--dropout] --model-type MODEL_TYPE
                        [--final-layer FINAL_LAYER]

  optional arguments:
    -h, --help            show this help message and exit
    --batchsize BATCHSIZE, -b BATCHSIZE
                          Number of examples in each mini-batch
    --epoch EPOCH, -e EPOCH
                          Number of sweeps over the dataset to train
    --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
    --out OUT, -o OUT     Directory to output the result
    --resume RESUME, -r RESUME
                          Resume the training from snapshot
    --test                Use tiny datasets for quick tests
    --unit UNIT, -u UNIT  Number of LSTM units in each layer
    --glove GLOVE         path to glove vector
    --dropout             use dropout?
    --model-type MODEL_TYPE
                          bilstm / lstm / charlstm
    --final-layer FINAL_LAYER
                          loss function

Testing

  • Testing the model without CRF layer: predict.py
  • Testing the model with CRF layer: crf_predict.py

Options:

optional arguments:
  -h, --help            show this help message and exit
  --unit UNIT, -u UNIT  Number of LSTM units in each layer
  --glove GLOVE         path to glove vector
  --model-type MODEL_TYPE
                        bilstm / lstm / charlstm
  --model MODEL         path to model file
  --dev                 If true, use validation data

Do not forget to specify --model-type and model. (You need to give the path to trained model file)

The performance (Accuracy/Precision/F-Score) can be tested by conlleval.pl (not included in this repo.)

Results

Model CRF? Precision Recall F-Score
LSTM No 70.87 65.38 68.01
Bi-LSTM No 76.41 74.39 75.39
Bi-Char-LSTM No 84.93 81.65 83.26
LSTM Yes 75.49 77.17 76.32
Bi-LSTM Yes 79.71 81.49 80.59
Bi-Char-LSTM Yes 84.17 83.80 83.98

Learning Curves

Epoch vs. training data loss

epoch vs. training data loss

Epoch vs. Validation Data Loss

epoch vs. validation data loss

Epoch vs. Training Data Accuracy

epoch vs. training data accuracy

Epoch vs. Validation Data Accuracy

epoch vs. validation data accuracy

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].