DongjunLee / dqn-tensorflow Licence: other
Deep Q Network implements by Tensorflow
Programming Languages python 139335 projects - #7 most used programming language
Projects that are alternatives of or similar to dqn-tensorflow Keras Rl2 Reinforcement learning with tensorflow 2 keras
Stars : ✭ 134 (+436%)
Mutual labels: deep , dqn
Protobuf-Dreamer A tiled DeepDream project for creating any size of image, on both CPU and GPU
Stars : ✭ 39 (+56%)
Mutual labels: deep
Differentia.js No longer being supported or maintained. A Graph Theory & Data Structure Library for JavaScript.
Stars : ✭ 13 (-48%)
Mutual labels: deep
DQN-Atari Deep Q-Learning (DQN) implementation for Atari pong.
Stars : ✭ 53 (+112%)
Mutual labels: dqn
IJCAI2018 SSDH Semantic Structure-based Unsupervised Deep Hashing IJCAI2018
Stars : ✭ 38 (+52%)
Mutual labels: deep
pytorch-rl Pytorch Implementation of RL algorithms
Stars : ✭ 15 (-40%)
Mutual labels: dqn
iextrading4j-hist IEX Trading library to parse TOPS and DEEP multicast packets
Stars : ✭ 20 (-20%)
Mutual labels: deep
Explorer Explorer is a PyTorch reinforcement learning framework for exploring new ideas.
Stars : ✭ 54 (+116%)
Mutual labels: dqn
distributedRL A framework for easy prototyping of distributed reinforcement learning algorithms
Stars : ✭ 93 (+272%)
Mutual labels: dqn
cca zoo Canonical Correlation Analysis Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic methods in a scikit-learn style framework
Stars : ✭ 103 (+312%)
Mutual labels: deep
DQN-pytorch A PyTorch implementation of Human-Level Control through Deep Reinforcement Learning
Stars : ✭ 23 (-8%)
Mutual labels: dqn
introspected Introspection for serializable arrays and JSON friendly objects.
Stars : ✭ 75 (+200%)
Mutual labels: deep
defaults-deep Like `extend` but recursively copies only the missing properties/values to the target object.
Stars : ✭ 26 (+4%)
Mutual labels: deep
rl Reinforcement learning algorithms implemented using Keras and OpenAI Gym
Stars : ✭ 14 (-44%)
Mutual labels: deep
chainer-notebooks Jupyter notebooks for Chainer hands-on
Stars : ✭ 23 (-8%)
Mutual labels: dqn
Deep Q Network
Paper
TO DO
Test: Atari
more complex ConvNet model
use TensorBoard
average loss
average q
average reward (consecutive 100 episode)
episode reward
Config
python main.py -h
--discount_rate DISCOUNT_RATE
Initial discount rate.
--replay_memory_length REPLAY_MEMORY_LENGTH
Number of replay memory episode.
--target_update_count TARGET_UPDATE_COUNT
DQN Target Network update count.
--max_episode_count MAX_EPISODE_COUNT
Number of maximum episodes.
--batch_size BATCH_SIZE
Batch size. (Must divide evenly into the dataset
sizes)
--frame_size FRAME_SIZE
Frame size. (Stack env' s observation T-n ~ T)
--model_name MODEL_NAME
DeepLearning Network Model name (MLPv1, ConvNetv1)
--learning_rate LEARNING_RATE
Batch size. (Must divide evenly into the dataset
sizes)
--gym_result_dir GYM_RESULT_DIR
Directory to put the gym results.
--gym_env GYM_ENV Name of Open Gym' s enviroment name. (CartPole-v0,
CartPole-v1, MountainCar-v0)
--step_verbose [STEP_VERBOSE]
verbose every step count
--step_verbose_count STEP_VERBOSE_COUNT
verbose step count
Model
1. MLPv1
hidden layer (16, 64, 32)
AdamOptimizer
2. ConvNetv1
3 Conv + MaxPool Layers (kernel_size [3, 3, 3], filters [32, 64, 128])
2 Fully Connected Layers (hidden_size [128, 32])
AdamOptimizer
3. ConvNetv2
5 Conv + MaxPool Layers (kernel_size [7, 5, 3, 3, 3], filters [126, 256, 512, 512, 512]
2 Fully Connected Layers (hidden_size [1024, 256])
AdamOptimizer
Expertiments
Classic control
CartPole-v0
CartPole-v1
MountainCar-v0
defines "solving" as getting average reward of 195.0 over 100 consecutive trials.
defines "solving" as getting average reward of 475.0 over 100 consecutive trials.
defines "solving" as getting average reward of -110.0 over 100 consecutive trials.
Model : MLPv1
Model : MLPv1
Model : MLPv1
Clear : after 177 episode
Clear : after 791 episode
Clear : after 1182 episode
Atari
Assault-ram-v0
Maximize your score
Model : ConvNetv2
Score : 421.12 (average from 100 consecutive trials)
2000 Episode (Learn something.. but still stupid)
Breakout-ram-v0
Maximize your score
Model : ConvNetv1
Score : 9.69 (average from 100 consecutive trials)
Reference
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at
[email protected] .