NathanDuran / Probabilistic-RNN-DA-Classifier

Licence: GPL-3.0 license

Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Probabilistic-RNN-DA-Classifier

automatic-personality-prediction

[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings

Stars: ✭ 43 (+95.45%)

Mutual labels: recurrent-neural-networks, rnn, dialogue-data

EdgarAllanPoetry

Computer-generated poetry

Stars: ✭ 22 (+0%)

Mutual labels: corpus, recurrent-neural-networks, rnn

Rnn ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Stars: ✭ 220 (+900%)

Mutual labels: recurrent-neural-networks, rnn

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+14486.36%)

Mutual labels: recurrent-neural-networks, rnn

Seq2seq Chatbot

Chatbot in 200 lines of code using TensorLayer

Stars: ✭ 777 (+3431.82%)

Mutual labels: corpus, rnn

Pytorch Kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+9431.82%)

Mutual labels: recurrent-neural-networks, rnn

Deep Learning With Python

Deep learning codes and projects using Python

Stars: ✭ 195 (+786.36%)

Mutual labels: recurrent-neural-networks, rnn

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+1990.91%)

Mutual labels: corpus, embeddings

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (+336.36%)

Mutual labels: recurrent-neural-networks, rnn

TV4Dialog

No description or website provided.

Stars: ✭ 33 (+50%)

Mutual labels: dialogue, corpus

open2ch-dialogue-corpus

おーぷん2ちゃんねるをクロールして作成した対話コーパス

Stars: ✭ 65 (+195.45%)

Mutual labels: dialogue, corpus

dialogue-datasets

collect the open dialog corpus and some useful data processing utils.

Stars: ✭ 24 (+9.09%)

Mutual labels: dialogue, corpus

Rnn From Scratch

Use tensorflow's tf.scan to build vanilla, GRU and LSTM RNNs

Stars: ✭ 123 (+459.09%)

Mutual labels: recurrent-neural-networks, rnn

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (+440.91%)

Mutual labels: recurrent-neural-networks, rnn

Iseebetter

Stars: ✭ 202 (+818.18%)

Mutual labels: recurrent-neural-networks, rnn

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (+340.91%)

Mutual labels: recurrent-neural-networks, rnn

Dialogue-Corpus

No description or website provided.

Stars: ✭ 27 (+22.73%)

Mutual labels: dialogue, corpus

Gru Svm

[ICMLC 2018] A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection

Stars: ✭ 76 (+245.45%)

Mutual labels: recurrent-neural-networks, rnn

Easyesn

Python library for Reservoir Computing using Echo State Networks

Stars: ✭ 93 (+322.73%)

Mutual labels: recurrent-neural-networks, rnn

Chatbot-Training-Corpus

总结了一些可以用作聊天机器人训练实作的文字语聊，包含中英文不同语言

Stars: ✭ 117 (+431.82%)

Mutual labels: dialogue, corpus

View All Similar Projects ➔

Probabilistic-RNN-DA-Classifier

Overview

An LSTM for Dialogue Act (DA) classification on the Switchboard Dialogue Act Corpus. This is the implementation for the paper Probabilistic Word Association for Dialogue Act Classification with Recurrent Neural Networks. The repository contains two LSTM models implemented in Keras. da_lstm.py uses utterance representations generated from pre-trained Word2Vec and GloVe word embeddings and probabilistic_lstm.py uses utterance representations generated from keywords selected for their frequency association with certain DAs.

Both models use the same architecture, with the ouput of the LSTM at each timestep combined using a max-pooling layer before a final feed forward layer outputs the probability distribution over all DA labels for that utterance.

Datasets

The data directory contains pre-processed Switchboard DA Corpus data in raw-text (.txt) and .pkl format. The same training and test splits as used by Stolcke et al. (2000) and an additional validation set is included. The development set is a subset of the training set to speed up development and testing.

Dataset	# Transcripts	# Utterances
Training	1115	192,768
Development	300	51,611
Test	19	4,088
Validation	21	3,196

Metadata

words.txt and labels.txt contain full lists of the vocabulary and labels along with how frequently they occur. metadata.pkl contains useful pre-processed data such as vocabulary and vocabulary size, DA label-to-index conversion dictionaries and maximum utterance length.

num_utterances = Total number of utterance in the full corpus.
max_utterance_len = Number of words in the longest utterance in the corpus.
vocabulary = List of tuples (word, word frequency).
vocabulary_size = Number of words in the vocabulary.
index_to_word = Dictionary mapping vocabulary index to word.
word_to_index = Dictionary mapping vocabulary word to index.
labels = List of tuples (label, label frequency).
num_labels = Number of labels used from the Switchboard data.
label_to_index = Dictionary mappings label to index.
index_to_label = Dictionary mapping index to label.

Usage

Traditional Word Embeddings

To run da_lstm.py an embedding matrix must first be created from pre-trained embeddings such as word2vec or GloVe. In the paper the model was tested on GloVe embeddings trained on Wikipedia data and Word2Vec trained on Google News. The Word2Vec embeddings trained on the Switchboard corpus are included with this repository. To generate the matrix simply run generate_embeddings.py after specifying the embeddings filename and directory (default = 'embeddings'). Then run da_lstm.py after specifying the name of the .pkl embeddings file generated by generate_embeddings.py.

Probabilistic Word Embeddings

To run probabilistic_lstm.py a probability matrix must first be created from the raw switchboard data. Run generate_word_frequencies.py specifying the frequency threshold (freq_thresh) i.e. how many times a word may appear in the corpus to be considered (default = 2). Then run probabilistic_lstm.py specifying the same word frequency (word_frequency) parameter.

Utility Files

process_all_swbd_data.py - processes the entire corpus into raw-text and generates the metadata.pkl file.
process_batch_swbd_data.py - processes only a specified list of transcripts from a text file i.e. test_split.txt.
utilities.py - contains utility functions for saving and loading data and models as well as processing data for use at runtime.
swda.py - contains utility functions for loading and iterating the switchboard transcripts and utterances in .csv format. This file is part of the repository developed by Christopher Potts, and is available here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

NathanDuran / Probabilistic-RNN-DA-Classifier

Programming Languages

Labels

Projects that are alternatives of or similar to Probabilistic-RNN-DA-Classifier

Probabilistic-RNN-DA-Classifier

Overview

Datasets

Metadata

Usage

Traditional Word Embeddings

Probabilistic Word Embeddings

Utility Files