All Projects → louaaron → GAN-Q-Learning

louaaron / GAN-Q-Learning

Licence: other
Unofficial Implementation of GAN Q Learning https://arxiv.org/abs/1805.04874

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to GAN-Q-Learning

Reinforcement Learning
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
Stars: ✭ 3,329 (+7826.19%)
Mutual labels:  qlearning
Deep reinforcement learning course
Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
Stars: ✭ 3,232 (+7595.24%)
Mutual labels:  qlearning
DOM-Q-NET
Graph-based Deep Q Network for Web Navigation
Stars: ✭ 30 (-28.57%)
Mutual labels:  qlearning
reinforced-race
A model car learns driving along a track using reinforcement learning
Stars: ✭ 37 (-11.9%)
Mutual labels:  qlearning
Reinforcement-Learning-An-Introduction
Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)
Stars: ✭ 28 (-33.33%)
Mutual labels:  qlearning
Deep-QLearning-Demo-csharp
This demo is a C# port of ConvNetJS RLDemo (https://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html) by Andrej Karpathy
Stars: ✭ 34 (-19.05%)
Mutual labels:  qlearning
reinforcement-learning-flappybird
In-browser reinforcement learning for flappy bird 🐦
Stars: ✭ 41 (-2.38%)
Mutual labels:  qlearning
Q-learning-conv-net
Q learning AI bot perceiving environment with CNN
Stars: ✭ 14 (-66.67%)
Mutual labels:  qlearning
cartpole-rl-remote
CartPole game by Reinforcement Learning, a journey from training to inference
Stars: ✭ 24 (-42.86%)
Mutual labels:  qlearning
ReinforcementLearning Sutton-Barto Solutions
Solutions and figures for problems from Reinforcement Learning: An Introduction Sutton&Barto
Stars: ✭ 20 (-52.38%)
Mutual labels:  qlearning

This code implements the "GAN Q-Learning" algorithm found in https://arxiv.org/abs/1805.04874.

Modifications From Paper

  • The published algorithm has a typo in it (in the form of the discriminator loss)

  • Currently, there seems to be a situation which causes the discriminator to (eventually) perfectly discriminate against the generator (even before learning the actual distribution) on the cartpole environment. I've experimented with different hyperparamters, but this is definitely there. For example, even when I update the generater 10 times per discriminator update, the training graph is still as follows

graph

Final Results

In the end, I was unable to reproduce the results given in the paper since my computer couldn't sweep enough hyperparameters. After verifying that the algorithm is correct, I found that the classic problems of training GANs arose. In particular, the discriminator easily overfit the reward distribution, meaning that the generator got stuck and the reward function couldn't learn. Even with significant artchitecture modifications, these problems persisted.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].