All Projects â†’ pochih â†’ Rl Chatbot

pochih / Rl Chatbot

Licence: mit
🤖 Deep Reinforcement Learning Chatbot

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rl Chatbot

Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (-65.27%)
Mutual labels:  chatbot, reinforcement-learning
Tensorlayer
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥
Stars: ✭ 6,796 (+1803.64%)
Mutual labels:  chatbot, reinforcement-learning
Neuraldialogpapers
Summary of deep learning models for dialog systems (Tiancheng Zhao LTI, CMU)
Stars: ✭ 641 (+79.55%)
Mutual labels:  chatbot, reinforcement-learning
Awesome Ai
A curated list of artificial intelligence resources (Courses, Tools, App, Open Source Project)
Stars: ✭ 161 (-54.9%)
Mutual labels:  chatbot, reinforcement-learning
Curl
CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
Stars: ✭ 346 (-3.08%)
Mutual labels:  reinforcement-learning
Metacar
A reinforcement learning environment for self-driving cars in the browser.
Stars: ✭ 337 (-5.6%)
Mutual labels:  reinforcement-learning
Irl Imitation
Implementation of Inverse Reinforcement Learning (IRL) algorithms in python/Tensorflow. Deep MaxEnt, MaxEnt, LPIRL
Stars: ✭ 333 (-6.72%)
Mutual labels:  reinforcement-learning
Wechaty Getting Started
A Starter Project Template for Wechaty works out-of-the-box
Stars: ✭ 330 (-7.56%)
Mutual labels:  chatbot
Pytorch Cpp Rl
PyTorch C++ Reinforcement Learning
Stars: ✭ 353 (-1.12%)
Mutual labels:  reinforcement-learning
Rivescript Js
A RiveScript interpreter for JavaScript. RiveScript is a scripting language for chatterbots.
Stars: ✭ 350 (-1.96%)
Mutual labels:  chatbot
Arxivtimes
repository to research & share the machine learning articles
Stars: ✭ 3,651 (+922.69%)
Mutual labels:  reinforcement-learning
Gym Miniworld
Simple 3D interior simulator for RL & robotics research
Stars: ✭ 338 (-5.32%)
Mutual labels:  reinforcement-learning
Tf2rl
TensorFlow2 Reinforcement Learning
Stars: ✭ 353 (-1.12%)
Mutual labels:  reinforcement-learning
Pytorch Chatbot
Pytorch seq2seq chatbot
Stars: ✭ 336 (-5.88%)
Mutual labels:  chatbot
Cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features
Stars: ✭ 349 (-2.24%)
Mutual labels:  reinforcement-learning
Chatbot
An AI Based Chatbot [DEPRECATED]
Stars: ✭ 332 (-7%)
Mutual labels:  chatbot
Intelligo
🤖 Chatbot Framework for Node.js.
Stars: ✭ 347 (-2.8%)
Mutual labels:  chatbot
Awesome Self Supervised Learning
A curated list of awesome self-supervised methods
Stars: ✭ 4,492 (+1158.26%)
Mutual labels:  reinforcement-learning
Pomdps.jl
MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
Stars: ✭ 338 (-5.32%)
Mutual labels:  reinforcement-learning
Trpo
Trust Region Policy Optimization with TensorFlow and OpenAI Gym
Stars: ✭ 343 (-3.92%)
Mutual labels:  reinforcement-learning

Open Source Love

Intro

This is a chatbot trained by seq2seq and reinforcement learning.

  • seq2seq

Seq2seq is a classical model for structured learning, its input and output are both sequence

The vanilla seq2seq model is described in a NIPS '14 paper Sequence to Sequence Learning with Neural Networks, the encoder and the decoder are seperated

The seq2seq model in this repository is constructed with 2 LSTMs, similar to the one described in an ICCV '15 paper Sequence to Sequence -- Video to Text, the encoder and the decoder share same weights

  • RL

After training chatbot with enough epochs, I use a RL technique called policy gradient to further improve the chatbot

By doing this, the chatbot can generate more interesting response with regard to the reward function

My reward function is similar to the one described in an EMNLP '16 paper Deep Reinforcement Learning for Dialogue Generation

Chatbot results

Let me show you some chatbot's results:

  • Result 1 (A & B are both a trained chatbot)

A: Can you do me a favor?

B: Anything.

A: I want you to bend a father.

B: Of course.

A: I'm so sorry I didn't mean to be rude but.

B: Don't get it at least it's about as much as I can.

  • Result 2 (A & B are both a trained chatbot)

A: I didn't do it! I haven't got a gun!

B: You liar. You still want to get your money back.

A: Something wrong with the truth.

B: I don't want any money just a little I can't take just out of this.

A: What's wrong.

B: Listen to me. I'm scared now.

A: Ah you'll need all the money I need your help.

  • Result 3 (A & B are both a trained chatbot)

A: I forgot to get the Coca-Cola.

B: I got something stuck in the head.

A: It's all right I guess it's pretty common in the lot of shit.

B: I mean we've been all this together since the kid.

A: All the more reason.

B: It's not him it's his fault and he's blind because of god.

A: And is a strange story.

you can find more results from chatbot in this directory

neural networks' configs of those results are described in the filename

Simulate dialogs by pre-trained model

Are you amazed at the result of the chatbot? :D

You can generate those amazing dialogs by your own!

Just follow the instructions below:

pip install -r requirements.txt
./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists
So make sure there's no break during the download)
./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <PATH TO MODEL>

to generate seq2seq dialog, type "model/Seq2Seq/model-77"

to generate RL dialog, type "model/RL/model-56-3000"

  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

  • <INPUT FILE>

Take a look at result/sample_input_new.txt

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

Generate responses by pre-trained model

If you want chatbot to generate only a single response for each question

Follow the instructions below:

pip install -r requirements.txt
./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists. So make sure there's no break during the download)
./script/run.sh <TYPE> <INPUT FILE> <OUTPUT FILE>
  • <TYPE>

to generate seq2seq response, type "S2S"

to generate reinforcement learning response, type "RL"

  • <INPUT FILE>

Take a look at result/sample_input_new.txt

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

Train chatbot from scratch

I trained my chatbot with python2.7.

If you want to train the chatbot from scratch

You can follow those instructions below:

Step0: training configs

Take a look at python/config.py, all configs for training is described here.

You can change some training hyper-parameters, or just keep the original ones.

Step1: download data & libraries

I use Cornell Movie-Dialogs Corpus

You need to download it, unzip it, and move all *.txt files into data/ directory

Then download some libraries with pip:

pip install -r requirements.txt

Step2: parse data

(in this step I use python3)
./script/parse.sh

Step3: train a Seq2Seq model

./script/train.sh

Step4-1: test a Seq2Seq model

Let's show some results of seq2seq model :)

./script/test.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step4-2: simulate a dialog

And show some dialog results from seq2seq model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot will only considers user's utterance

if you choose 2, chatbot will considers user's utterance and chatbot's last utterance

Step5: train a RL model

you need to change the training_type parameter in python/config.py

'normal' for seq2seq training, 'pg' for policy gradient

you need to first train with 'normal' for some epochs till stable (at least 30 epoches is highly recommended)

then change the method to 'pg' to optimize the reward function

./script/train_RL.sh

When training with policy gradient (pg)

you may need a reversed model

the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.

you can download pre-trained reversed model by

./script/download_reversed.sh

or you can train it by your-self

you don't need to change any setting about reversed model if you use pre-trained reversed model

Step6-1: test a RL model

Let's generate some results of RL model, and find the different from seq2seq model :)

./script/test_RL.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step6-2: generate a dialog

And show some dialog results from RL model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

Environment

  • OS: CentOS Linux release 7.3.1611 (Core)
  • CPU: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
  • GPU: GeForce GTX 1070 8GB
  • Memory: 16GB DDR3
  • Python3 (for data_parser.py) & Python2.7 (for others)

Author

Po-Chih Huang / @pochih

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].