A SVM model that classifies the reviews as real or fake. Used both the review text and the additional features contained in the data set to build a model that predicted with over 85% accuracy without using any deep learning techniques.

Stars: ✭ 42 (-10.64%)

Mutual labels: nlp-machine-learning

Quora QuestionPairs DL

Kaggle Competition: Using deep learning to solve quora's question pairs problem

Stars: ✭ 54 (+14.89%)

Mutual labels: nlp-machine-learning

coursera-gan-specialization

Programming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai

Stars: ✭ 277 (+489.36%)

Mutual labels: conditional-gan

ShortText-Fasttext

ShortText classification

Stars: ✭ 12 (-74.47%)

Mutual labels: nlp-machine-learning

View All Similar Projects ➔

Daisy: Dialog Analogous Intellectual System

Machine Learning: Chatbot

Conditional Sequence Generative Adversarial Network Trained with Policy Gradient, Implementation in Tensorflow

Requirements:

Tensorflow r1.5.1
Python 3.6
Numpy 1.13.3
tempfile (Optional)
gtts (Optional)
pygame (Optional)

Introduction

Apply Policy Gradient to Generative Adversarial Nets to improve ChatBot dialog generation quality.

Generator: seq2seq model with attention and embedding
Discriminator: hierarchical encoder and two class classifier
Policy Gradient Update: use reward computed by the discriminator to update the generator
Monte Carlo Rollout: to compute reward for every generation step
Teacher Forcing: add maximum likelihood estimation training to GAN training
sampled softmax: to reduce computation complexity from regular softmax
model checkpoint supported: training early stopping and save model with best loss

Preperation

Download corpus from:

1) https://www.cs.cornell.edu/%7Ecristian/Cornell_Movie-Dialogs_Corpus.html (official site)
2) https://drive.google.com/file/d/13TvKXXrKVg9X7IEayu7eYpRbxfZ5GDE0/view?usp=sharing (backup link)

extract the .zip file, and save them to the path below:

config.corpus_path = '../cornell_movie_dialog_corpus'

This default path can be modified by changing the '--corpus_dir' option in 'config.py'.

Run Preprocessing on the raw corpus file:

python3 preprocess_data.py

This saves the pre-processed files to the path below:

config.data_dir = '../data'

This default path can be modified by changing the '--data_dir' option in 'config.py'.

After running pre-processing, 5 files will be created:

1. processed_corpus.txt : Encoder input for training
2. word2idx.pkl : word to idx dictionary mapping saved by pickle
3. idx2word.pkl : idx to word dictionary mapping saved by pickle
4. train_encode.pkl: model-ready training data: tokenized and index labeled text ready to be fed to the encoder
5. train_decode.pkl: model-ready training data: tokenized and index labeled text ready to be fed to the decoder

Training

To run MLE pre train on the Generator model:

python3 run.py --pre_train --force_save

(optional: '--force_save' saves model every epoch, otherwise the default model checkpoint is activated and only the model with minimum loss will be saved.)

To run generative adversarial training with policy gradient:

python3 run.py --gan_train --force_save

(optional: '--force_save' saves model every epoch, otherwise the default model checkpoint is activated and only the model with minimum loss will be saved.)

Inference

To chat with the pre-trained model:

python3 run.py --pre_chat --speak

(optional: '--speak' enables audio response)

To chat with the GAN-trained model:

python3 run.py --gan_chat --speak

(optional: '--speak' enables audio response)

To evaluate the BLEU score of the pre-trained model:

python3 run.py --pre_evaluate

To evaluate the BLEU score of the GAN-trained model:

python3 run.py --gan_evaluate

Trained Models

To use exsiting models, place model files under the 'model' directory. Trained Models (pre-trained / gan-trained) can be download from:

https://drive.google.com/drive/folders/1C0A53qqQpdGyKIEB7HVb6BqlBD5gvEJy?usp=sharing

Other

Source code:

1. configuration.py
2. data_loader.py
3. discriminator.py
4. generator.py
5. train.py
6. test.py
7. run.py
8. tf_seq2seq_model.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

andi611 / Conditional-SeqGAN-Tensorflow

Programming Languages

Labels

Projects that are alternatives of or similar to Conditional-SeqGAN-Tensorflow

Daisy: Dialog Analogous Intellectual System

Machine Learning: Chatbot

Requirements:

Introduction

Preperation

Training

Inference

Trained Models

Other