All Projects → danijar → Dreamerv2

danijar / Dreamerv2

Licence: mit
Mastering Atari with Discrete World Models

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dreamerv2

Habitat Lab
A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.
Stars: ✭ 587 (+104.53%)
Mutual labels:  robotics, research, reinforcement-learning
Mindpark
Testbed for deep reinforcement learning
Stars: ✭ 163 (-43.21%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Gibsonenv
Gibson Environments: Real-World Perception for Embodied Agents
Stars: ✭ 666 (+132.06%)
Mutual labels:  robotics, research, reinforcement-learning
Holodeck Engine
High Fidelity Simulator for Reinforcement Learning and Robotics Research.
Stars: ✭ 48 (-83.28%)
Mutual labels:  robotics, research, reinforcement-learning
Awesome Decision Making Reinforcement Learning
A selection of state-of-the-art research materials on decision making and motion planning.
Stars: ✭ 68 (-76.31%)
Mutual labels:  artificial-intelligence, robotics, reinforcement-learning
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-33.1%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Yarp
YARP - Yet Another Robot Platform
Stars: ✭ 358 (+24.74%)
Mutual labels:  artificial-intelligence, robotics, research
Holodeck
High Fidelity Simulator for Reinforcement Learning and Robotics Research.
Stars: ✭ 513 (+78.75%)
Mutual labels:  robotics, research, reinforcement-learning
Pygame Learning Environment
PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.
Stars: ✭ 828 (+188.5%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Rex Gym
OpenAI Gym environments for an open-source quadruped robot (SpotMicro)
Stars: ✭ 684 (+138.33%)
Mutual labels:  artificial-intelligence, robotics, reinforcement-learning
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (+26.83%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Dreamer
Dream to Control: Learning Behaviors by Latent Imagination
Stars: ✭ 269 (-6.27%)
Mutual labels:  artificial-intelligence, robotics, reinforcement-learning
Toycarirl
Implementation of Inverse Reinforcement Learning Algorithm on a toy car in a 2D world problem, (Apprenticeship Learning via Inverse Reinforcement Learning Abbeel & Ng, 2004)
Stars: ✭ 128 (-55.4%)
Mutual labels:  artificial-intelligence, robotics, reinforcement-learning
Ravens
Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet. Transporter Nets, CoRL 2020.
Stars: ✭ 133 (-53.66%)
Mutual labels:  artificial-intelligence, robotics, reinforcement-learning
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-39.37%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Elf
An End-To-End, Lightweight and Flexible Platform for Game Research
Stars: ✭ 2,057 (+616.72%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Adeptrl
Reinforcement learning framework to accelerate research
Stars: ✭ 173 (-39.72%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Awesome Ml Courses
Awesome free machine learning and AI courses with video lectures.
Stars: ✭ 2,145 (+647.39%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Mlcomp
Distributed DAG (Directed acyclic graph) framework for machine learning with UI
Stars: ✭ 183 (-36.24%)
Mutual labels:  artificial-intelligence, research
Dm control
DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Stars: ✭ 2,592 (+803.14%)
Mutual labels:  artificial-intelligence, reinforcement-learning

Mastering Atari with Discrete World Models

Implementation of the DreamerV2 agent in TensorFlow 2. Training curves for all 55 games are included.

If you find this code useful, please reference in your paper:

@article{hafner2020dreamerv2,
  title={Mastering Atari with Discrete World Models},
  author={Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy},
  journal={arXiv preprint arXiv:2010.02193},
  year={2020}
}

Method

DreamerV2 is the first world model agent that achieves human-level performance on the Atari benchmark. DreamerV2 also outperforms the final performance of the top model-free agents Rainbow and IQN using the same amount of experience and computation. The implementation in this repository alternates between training the world model, training the policy, and collecting experience and runs on a single GPU.

World Model Learning

DreamerV2 learns a model of the environment directly from high-dimensional input images. For this, it predicts ahead using compact learned states. The states consist of a deterministic part and several categorical variables that are sampled. The prior for these categoricals is learned through a KL loss. The world model is learned end-to-end via straight-through gradients, meaning that the gradient of the density is set to the gradient of the sample.

Actor Critic Learning

DreamerV2 learns actor and critic networks from imagined trajectories of latent states. The trajectories start at encoded states of previously encountered sequences. The world model then predicts ahead using the selected actions and its learned state prior. The critic is trained using temporal difference learning and the actor is trained to maximize the value function via reinforce and straight-through gradients.

For more information:

Instructions

Get dependencies:

pip3 install --user tensorflow==2.3.1
pip3 install --user tensorflow_probability==0.11.1
pip3 install --user pandas
pip3 install --user matplotlib
pip3 install --user ruamel.yaml
pip3 install --user 'gym[atari]'

Train the agent:

python3 dreamer.py --logdir ~/logdir/atari_pong/dreamerv2/1 \
    --configs defaults atari --task atari_pong

Monitor results:

tensorboard --logdir ~/logdir

Generate plots:

python3 plotting.py --indir ~/logdir --outdir ~/plots --xaxis step --yaxis eval_return --bins 1e6

Tips:

  • Efficient debugging. You can use the debug config as in --configs defaults atari debug. This reduces the batch size, increases the evaluation frequency, and disables tf.function graph compilation for easy line-by-line debugging.

  • Infinite gradient norms. This is normal and described under loss scaling in the mixed precision guide. You can disable mixed precision by passing --precision 32 to the training script. Mixed precision is faster but can in principle cause numerical instabilities.

  • Accessing logged metrics. The metrics are stored in both TensorBoard and JSON lines format. You can directly load them using pandas.read_json(). The plotting script also stores the binned and aggregated metrics of multiple runs into a single JSON lines file for easy manual plotting.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].