All Projects → NathanDuran → Probabilistic-RNN-DA-Classifier

NathanDuran / Probabilistic-RNN-DA-Classifier

Licence: GPL-3.0 license
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Probabilistic-RNN-DA-Classifier

automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (+95.45%)
Mutual labels:  recurrent-neural-networks, rnn, dialogue-data
EdgarAllanPoetry
Computer-generated poetry
Stars: ✭ 22 (+0%)
Mutual labels:  corpus, recurrent-neural-networks, rnn
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (+900%)
Mutual labels:  recurrent-neural-networks, rnn
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+14486.36%)
Mutual labels:  recurrent-neural-networks, rnn
Seq2seq Chatbot
Chatbot in 200 lines of code using TensorLayer
Stars: ✭ 777 (+3431.82%)
Mutual labels:  corpus, rnn
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+9431.82%)
Mutual labels:  recurrent-neural-networks, rnn
Deep Learning With Python
Deep learning codes and projects using Python
Stars: ✭ 195 (+786.36%)
Mutual labels:  recurrent-neural-networks, rnn
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+1990.91%)
Mutual labels:  corpus, embeddings
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (+336.36%)
Mutual labels:  recurrent-neural-networks, rnn
TV4Dialog
No description or website provided.
Stars: ✭ 33 (+50%)
Mutual labels:  dialogue, corpus
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (+195.45%)
Mutual labels:  dialogue, corpus
dialogue-datasets
collect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (+9.09%)
Mutual labels:  dialogue, corpus
Rnn From Scratch
Use tensorflow's tf.scan to build vanilla, GRU and LSTM RNNs
Stars: ✭ 123 (+459.09%)
Mutual labels:  recurrent-neural-networks, rnn
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (+440.91%)
Mutual labels:  recurrent-neural-networks, rnn
Iseebetter
iSeeBetter: Spatio-Temporal Video Super Resolution using Recurrent-Generative Back-Projection Networks | Python3 | PyTorch | GANs | CNNs | ResNets | RNNs | Published in Springer Journal of Computational Visual Media, September 2020, Tsinghua University Press
Stars: ✭ 202 (+818.18%)
Mutual labels:  recurrent-neural-networks, rnn
Pytorch Learners Tutorial
PyTorch tutorial for learners
Stars: ✭ 97 (+340.91%)
Mutual labels:  recurrent-neural-networks, rnn
Dialogue-Corpus
No description or website provided.
Stars: ✭ 27 (+22.73%)
Mutual labels:  dialogue, corpus
Gru Svm
[ICMLC 2018] A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection
Stars: ✭ 76 (+245.45%)
Mutual labels:  recurrent-neural-networks, rnn
Easyesn
Python library for Reservoir Computing using Echo State Networks
Stars: ✭ 93 (+322.73%)
Mutual labels:  recurrent-neural-networks, rnn
Chatbot-Training-Corpus
总结了一些可以用作聊天机器人训练实作的文字语聊,包含中英文不同语言
Stars: ✭ 117 (+431.82%)
Mutual labels:  dialogue, corpus

Probabilistic-RNN-DA-Classifier

Overview

An LSTM for Dialogue Act (DA) classification on the Switchboard Dialogue Act Corpus. This is the implementation for the paper Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks. The repository contains two LSTM models implemented in Keras. da_lstm.py uses utterance representations generated from pre-trained Word2Vec and GloVe word embeddings and probabilistic_lstm.py uses utterance representations generated from keywords selected for their frequency association with certain DAs.

Both models use the same architecture, with the ouput of the LSTM at each timestep combined using a max-pooling layer before a final feed forward layer outputs the probability distribution over all DA labels for that utterance.

Datasets

The data directory contains pre-processed Switchboard DA Corpus data in raw-text (.txt) and .pkl format. The same training and test splits as used by Stolcke et al. (2000) and an additional validation set is included. The development set is a subset of the training set to speed up development and testing.

Dataset # Transcripts # Utterances
Training 1115 192,768
Development 300 51,611
Test 19 4,088
Validation 21 3,196

Metadata

words.txt and labels.txt contain full lists of the vocabulary and labels along with how frequently they occur. metadata.pkl contains useful pre-processed data such as vocabulary and vocabulary size, DA label-to-index conversion dictionaries and maximum utterance length.

  • num_utterances = Total number of utterance in the full corpus.
  • max_utterance_len = Number of words in the longest utterance in the corpus.
  • vocabulary = List of tuples (word, word frequency).
  • vocabulary_size = Number of words in the vocabulary.
  • index_to_word = Dictionary mapping vocabulary index to word.
  • word_to_index = Dictionary mapping vocabulary word to index.
  • labels = List of tuples (label, label frequency).
  • num_labels = Number of labels used from the Switchboard data.
  • label_to_index = Dictionary mappings label to index.
  • index_to_label = Dictionary mapping index to label.

Usage

Traditional Word Embeddings

To run da_lstm.py an embedding matrix must first be created from pre-trained embeddings such as word2vec or GloVe. In the paper the model was tested on GloVe embeddings trained on Wikipedia data and Word2Vec trained on Google News. The Word2Vec embeddings trained on the Switchboard corpus are included with this repository. To generate the matrix simply run generate_embeddings.py after specifying the embeddings filename and directory (default = 'embeddings'). Then run da_lstm.py after specifying the name of the .pkl embeddings file generated by generate_embeddings.py.

Probabilistic Word Embeddings

To run probabilistic_lstm.py a probability matrix must first be created from the raw switchboard data. Run generate_word_frequencies.py specifying the frequency threshold (freq_thresh) i.e. how many times a word may appear in the corpus to be considered (default = 2). Then run probabilistic_lstm.py specifying the same word frequency (word_frequency) parameter.

Utility Files

  • process_all_swbd_data.py - processes the entire corpus into raw-text and generates the metadata.pkl file.
  • process_batch_swbd_data.py - processes only a specified list of transcripts from a text file i.e. test_split.txt.
  • utilities.py - contains utility functions for saving and loading data and models as well as processing data for use at runtime.
  • swda.py - contains utility functions for loading and iterating the switchboard transcripts and utterances in .csv format. This file is part of the repository developed by Christopher Potts, and is available here.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].