All Projects → forestagostinelli → DeepCubeA

forestagostinelli / DeepCubeA

Licence: other
Code for DeepCubeA, a Deep Reinforcement Learning algorithm that can learn to solve the Rubik's cube.

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to DeepCubeA

pytorch-distributed
Ape-X DQN & DDPG with pytorch & tensorboard
Stars: ✭ 98 (+6.52%)
Mutual labels:  deep-reinforcement-learning
ActiveRagdollControllers
Research into controllers for 2d and 3d Active Ragdolls (using MujocoUnity+ml_agents)
Stars: ✭ 30 (-67.39%)
Mutual labels:  deep-reinforcement-learning
deep-blueberry
If you've always wanted to learn about deep-learning but don't know where to start, then you might have stumbled upon the right place!
Stars: ✭ 17 (-81.52%)
Mutual labels:  deep-reinforcement-learning
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+141.3%)
Mutual labels:  deep-reinforcement-learning
Rainy
☔ Deep RL agents with PyTorch☔
Stars: ✭ 39 (-57.61%)
Mutual labels:  deep-reinforcement-learning
AnimCubeAndroid
Rubik's Cube rendering and interaction library for Android.
Stars: ✭ 57 (-38.04%)
Mutual labels:  rubiks-cube
omd
JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"
Stars: ✭ 43 (-53.26%)
Mutual labels:  deep-reinforcement-learning
revisiting rainbow
Revisiting Rainbow
Stars: ✭ 71 (-22.83%)
Mutual labels:  deep-reinforcement-learning
tpprl
Code and data for "Deep Reinforcement Learning of Marked Temporal Point Processes", NeurIPS 2018
Stars: ✭ 68 (-26.09%)
Mutual labels:  deep-reinforcement-learning
FlashRL
No description or website provided.
Stars: ✭ 25 (-72.83%)
Mutual labels:  deep-reinforcement-learning
awesome-rl
Awesome RL: Papers, Books, Codes, Benchmarks
Stars: ✭ 105 (+14.13%)
Mutual labels:  deep-reinforcement-learning
dqn-obstacle-avoidance
Deep Reinforcement Learning for Fixed-Wing Flight Control with Deep Q-Network
Stars: ✭ 57 (-38.04%)
Mutual labels:  deep-reinforcement-learning
a3c-super-mario-pytorch
Reinforcement Learning for Super Mario Bros using A3C on GPU
Stars: ✭ 35 (-61.96%)
Mutual labels:  deep-reinforcement-learning
RL course
The page of the Ural Federal University course "Reinforcement Learning and Neural Networks"
Stars: ✭ 23 (-75%)
Mutual labels:  deep-reinforcement-learning
DI-star
An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
Stars: ✭ 1,335 (+1351.09%)
Mutual labels:  deep-reinforcement-learning
good robot
"Good Robot! Now Watch This!": Repurposing Reinforcement Learning for Task-to-Task Transfer; and “Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer
Stars: ✭ 84 (-8.7%)
Mutual labels:  deep-reinforcement-learning
jax-rl
JAX implementations of core Deep RL algorithms
Stars: ✭ 61 (-33.7%)
Mutual labels:  deep-reinforcement-learning
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (-65.22%)
Mutual labels:  deep-reinforcement-learning
ml course
"Learning Machine Learning" Course, Bogotá, Colombia 2019 #LML2019
Stars: ✭ 22 (-76.09%)
Mutual labels:  deep-reinforcement-learning
Fruit-API
A Universal Deep Reinforcement Learning Framework
Stars: ✭ 61 (-33.7%)
Mutual labels:  deep-reinforcement-learning

DeepCubeA

This is the code for DeepCubeA for python3 and PyTorch. The original python2, tensorflow code can be found on CodeOcean.

This currently contains the code for using DeepCubeA to solve the Rubik's cube, 15-puzzle, 24-puzzle, 35-puzzle, 48-puzzle, and Lights Out. You can also adapt this code to use DeepCubeA to solve new problems that you might be working on.

For any issues, please contact Forest Agostinelli ([email protected])

Setup

For required python packages, please see requirements.txt. You should be able to install these packages with pip or conda

Python version used: 3.7.2

IMPORTANT! Before running anything, please execute: source setup.sh in the DeepCubeA directory to add the current directory to your python path.

Training and A* Search

train.sh contains the commands to trian the cost-to-go function as well as using it with A* search. Note that some of the hyperparameters may be slightly different than those in the paper as they were later found to give slightly better results.

There are pre-trained models in the saved_models/ directory as well as output.txt files to let you know what output to expect.

These models were trained with 1-4 GPUs and 20-30 CPUs. This varies throughout training as the training is often stopped and started again to make room for other processes.

There are pre-computed results of A* search in the results/ directory.

Commands to train DeepCubeA to solve the 15-puzzle.

Train cost-to-go function

python ctg_approx/avi.py --env puzzle15 --states_per_update 50000000 --batch_size 10000 --nnet_name puzzle15 --max_itrs 1000000 --loss_thresh 0.1 --back_max 500 --num_update_procs 30

Solve with A* search, use --verbose for more information

python search_methods/astar.py --states data/puzzle15/test/data_0.pkl --model saved_models/puzzle15/current/ --env puzzle15 --weight 0.8 --batch_size 20000 --results_dir results/puzzle15/ --language cpp --nnet_batch_size 10000

Compare to shortest path

python scripts/compare_solutions.py --soln1 data/puzzle15/test/data_0.pkl --soln2 results/puzzle15/results.pkl

Improving Results

During approximate value iteration (AVI), one can get better results by increasing the batch size (--batch_size) and number of states per update (--states_per_update). Decreasing the threshold before the target network is updated (--loss_thresh) can also help.

One can also add additional states to training set by doing greedy best-first search (GBFS) during the update stage and adding the states encountered during GBFS to the states used for approximate value iteration (--max_update_steps). Setting --max_update_steps to 1 is the same as doing approximate value iteration.

During A* search, increasing the weight on the path cost (--weight, range should be [0,1]) and the batch size (--batch_size) generally improves results.

These improvements often come at the expense of time.

Using DeepCubeA to Solve New Problems

Create your own environment by implementing the abstract methods in environments/environment_abstract.py See the implementations in environments/ for examples.

After implementing your method, edit utils/env_utils.py to return your environment object given your chosen keyword.

Use tests/timing_test.py to make sure basic aspects of your implementation are working correctly.

Parallelism

Training and solving can be easily parallelized across multiple CPUs and GPUs.

When training with ctg_approx/avi.py, set the number of workers for doing approximate value iteration with --num_update_procs During the update process, the target DNN is spawned on each available GPU and they work in parallel during the udpate step.

The number of GPUs used can be controlled by setting the CUDA_VISIBLE_DEVICES environment variable.

i.e. export CUDA_VISIBLE_DEVICES="0,1,2,3"

Memory

When obtaining training data with approximate value iteration and solving using A* search, the batch size of the data given to the DNN can be controlled with --update_nnet_batch_size for the avi.py file and --nnet_batch_size for the astar.py file. Reduce this value if your GPUs are running out of memory during approximate value iteration or during A* search.

Compiling C++ for A* Search

cd cpp/

make

If you are not able to get the C++ version working on your computer, you can change the --language switch for search_methods/astar.py from --language cpp to --language python. Note that the C++ version is generally faster.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].