Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gameofdimension → policy-gradient-pong

gameofdimension / policy-gradient-pong

Licence: other

tensorflow implementation of Andrej Karpathy's blog about reinforcement learning. http://karpathy.github.io/2016/05/31/rl/

Programming Languages

139335 projects - #7 most used programming language

Labels

reinforcement-learning tensorflow policy-gradient

Projects that are alternatives of or similar to policy-gradient-pong

Ranking Policy Gradient

Stars: ✭ 22 (-24.14%)

Mutual labels: policy-gradient

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Stars: ✭ 54 (+86.21%)

Mutual labels: policy-gradient

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Stars: ✭ 113 (+289.66%)

Mutual labels: policy-gradient

A course on Deep Reinforcement Learning in Computer Vision. Visit Website:

Stars: ✭ 59 (+103.45%)

Mutual labels: policy-gradient

imitation learning

PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.

Stars: ✭ 93 (+220.69%)

Mutual labels: policy-gradient

Reinforcement Learning

Deep Reinforcement Learning Algorithms implemented with Tensorflow 2.3

Stars: ✭ 61 (+110.34%)

Mutual labels: policy-gradient

Combining deep learning and reinforcement learning.

Stars: ✭ 84 (+189.66%)

Mutual labels: policy-gradient

rl implementations

No description or website provided.

Stars: ✭ 40 (+37.93%)

Mutual labels: policy-gradient

Solving board games like Connect4 using Deep Reinforcement Learning

Stars: ✭ 33 (+13.79%)

Mutual labels: policy-gradient

TRPO-TensorFlow

Trust Region Policy Optimization (TRPO) in pure TensorFlow

Stars: ✭ 17 (-41.38%)

Mutual labels: policy-gradient

Usage of policy gradient reinforcement learning to solve portfolio optimization problems (Tactical Asset Allocation).

Stars: ✭ 26 (-10.34%)

Mutual labels: policy-gradient

A set of RL experiments. Currently including: (1) the MDP rank experiment, based on policy gradient algorithm

Stars: ✭ 22 (-24.14%)

Mutual labels: policy-gradient

Deep-Reinforcement-Learning-CS285-Pytorch

Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework

Stars: ✭ 104 (+258.62%)

Mutual labels: policy-gradient

deep rl acrobot

TensorFlow A2C to solve Acrobot, with synchronized parallel environments

Stars: ✭ 32 (+10.34%)

Mutual labels: policy-gradient

Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO

Stars: ✭ 26 (-10.34%)

Mutual labels: policy-gradient

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (+665.52%)

Mutual labels: policy-gradient

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

Stars: ✭ 228 (+686.21%)

Mutual labels: policy-gradient

Applied Deep Learning (2019 Spring) @ NTU

Stars: ✭ 20 (-31.03%)

Mutual labels: policy-gradient

Implementation of Sequence Generative Adversarial Nets with Policy Gradient in PyTorch

Stars: ✭ 40 (+37.93%)

Mutual labels: policy-gradient

TD-Regularized Actor-Critic Methods

Stars: ✭ 28 (-3.45%)

Mutual labels: policy-gradient

View All Similar Projects ➔

policy-gradient-pong

Reinforcement learning approach to win Atari game pong.

tensorflow implementation of Andrej Karpathy's original numpy version.

dependencies

tensorflow
numpy
openai gym

usage

train:

python policy_gradient_pong.py

demo:

python policy_gradient_pong_demo.py <checkpoint path>

we provide trained weights in the folder weight/ which can beat computer with high probability.

notice

It takes very long time, about 90 hours on a dell RX 730 with a Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz 8 cores CPU, 16G RAM and a gtx 1080ti GPU, to win computer by 5 scores.

It takes much shorter time to train on a 2016 mac book pro without GPU. So i think much of the time spent on simulation, rather than network forward and backward.

training progress

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 29

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗