All Projects → gameofdimension → policy-gradient-pong

gameofdimension / policy-gradient-pong

Licence: other
tensorflow implementation of Andrej Karpathy's blog about reinforcement learning. http://karpathy.github.io/2016/05/31/rl/

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to policy-gradient-pong

rpg
Ranking Policy Gradient
Stars: ✭ 22 (-24.14%)
Mutual labels:  policy-gradient
Explorer
Explorer is a PyTorch reinforcement learning framework for exploring new ideas.
Stars: ✭ 54 (+86.21%)
Mutual labels:  policy-gradient
Paddle-RLBooks
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Stars: ✭ 113 (+289.66%)
Mutual labels:  policy-gradient
DRL in CV
A course on Deep Reinforcement Learning in Computer Vision. Visit Website:
Stars: ✭ 59 (+103.45%)
Mutual labels:  policy-gradient
imitation learning
PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.
Stars: ✭ 93 (+220.69%)
Mutual labels:  policy-gradient
Reinforcement Learning
Deep Reinforcement Learning Algorithms implemented with Tensorflow 2.3
Stars: ✭ 61 (+110.34%)
Mutual labels:  policy-gradient
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (+189.66%)
Mutual labels:  policy-gradient
rl implementations
No description or website provided.
Stars: ✭ 40 (+37.93%)
Mutual labels:  policy-gradient
connect4
Solving board games like Connect4 using Deep Reinforcement Learning
Stars: ✭ 33 (+13.79%)
Mutual labels:  policy-gradient
TRPO-TensorFlow
Trust Region Policy Optimization (TRPO) in pure TensorFlow
Stars: ✭ 17 (-41.38%)
Mutual labels:  policy-gradient
TAA-PG
Usage of policy gradient reinforcement learning to solve portfolio optimization problems (Tactical Asset Allocation).
Stars: ✭ 26 (-10.34%)
Mutual labels:  policy-gradient
RL
A set of RL experiments. Currently including: (1) the MDP rank experiment, based on policy gradient algorithm
Stars: ✭ 22 (-24.14%)
Mutual labels:  policy-gradient
Deep-Reinforcement-Learning-CS285-Pytorch
Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework
Stars: ✭ 104 (+258.62%)
Mutual labels:  policy-gradient
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (+10.34%)
Mutual labels:  policy-gradient
Deep-rl-mxnet
Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO
Stars: ✭ 26 (-10.34%)
Mutual labels:  policy-gradient
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+665.52%)
Mutual labels:  policy-gradient
HandyRL
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Stars: ✭ 228 (+686.21%)
Mutual labels:  policy-gradient
ADL2019
Applied Deep Learning (2019 Spring) @ NTU
Stars: ✭ 20 (-31.03%)
Mutual labels:  policy-gradient
SeqGAN-PyTorch
Implementation of Sequence Generative Adversarial Nets with Policy Gradient in PyTorch
Stars: ✭ 40 (+37.93%)
Mutual labels:  policy-gradient
td-reg
TD-Regularized Actor-Critic Methods
Stars: ✭ 28 (-3.45%)
Mutual labels:  policy-gradient

policy-gradient-pong

Reinforcement learning approach to win Atari game pong.

tensorflow implementation of Andrej Karpathy's original numpy version.

dependencies

  • tensorflow
  • numpy
  • openai gym

usage

train:

python policy_gradient_pong.py

demo:

python policy_gradient_pong_demo.py <checkpoint path>

we provide trained weights in the folder weight/ which can beat computer with high probability.

notice

It takes very long time, about 90 hours on a dell RX 730 with a Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz 8 cores CPU, 16G RAM and a gtx 1080ti GPU, to win computer by 5 scores.

It takes much shorter time to train on a 2016 mac book pro without GPU. So i think much of the time spent on simulation, rather than network forward and backward.

training progress

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].