All Projects → Sohojoe → MarathonEnvsBaselines

Sohojoe / MarathonEnvsBaselines

Licence: Apache-2.0 license
Experimental - using OpenAI baselines with MarathonEnvs (ML-Agents)

Programming Languages

python
139335 projects - #7 most used programming language
C#
18002 projects

MarathonEnvs + OpenAi.Baselines

Explority implementation of

  • MarathonEnvs
  • ml-agents
  • openai.baselines
  • stable.baselines

source versions

  • MarathonEnvs
  • ml-agents = 0.5.1
  • openai.baselines = 7bfbcf1
  • stable.baselines = v2.2.0

Install

  • Clone this repro (ideally from a release version)
  • Download / Unzip prebuilt MatathonEnvs into the env folder
  • pip installs
# ml-agents
cd ml-agents
pip install -e .
# gym-unity
cd gym-unity
pip install -e .
# baselines - does not need to be installed
# stable_baselines 
cd stable_baselines
pip install -e .


Status-Hopper

Win10 MacOS Notes
ml-agents-ppo score=435 (23min)
baselines-ppo2 multiagents score=943 (7min) score=860 (11min) 16 agents, nsteps=128
baselines-ppo2 multiagents non-normalized score=774 (7min) score=450 (11min) 16 agents, nsteps=128
baselines-ppo2 MPIx4 score=594 (42min) score=583 (82min) Having problems with mpi + ml-agents on windows. Save is broken for normalized agents
baselines-ppo2 single agent score=328 (31min) need to check if 1m steps with mpi == 1m steps with single agent as not clear why it would be faster. Save is broken for normalized agents
baselines-ppo2 MPIx4 TfRunningMeanStd TfRunningMeanStd fixes save / load but trains slower
baselines-ppo2 single agent TfRunningMeanStd score=95 (40m) score=107 (49min) TfRunningMeanStd fixes save / load but trains slower
baselines-ppo2 MPIx4 non-normalized score=50 (79min) (should try training for more steps)

Status-Walker

Win10 MacOS Notes
ml-agents-ppo
baselines-ppo2 multiagents score=1371 (8min) score=1439 (12min)
baselines-ppo2 multiagents non-normalized score=1005 (12min) 16 agents, nsteps=128

OpenAI.Baselines

Example command lines

To enable Tensorboard

# MacOS: 
export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' 
export OPENAI_LOGDIR=summaries

# Win10:
set OPENAI_LOG_FORMAT=stdout,log,csv,tensorboard
set OPENAI_LOGDIR=summaries

ppo2 for 1m steps

# MacOS training:
# multiagent
python -m baselines.run_multiagent_unity --alg=ppo2 --env=./envs/hopper-x16 --num_timesteps=1e6 --save_path=./models/hopper_1m_ppo2

# mpi creates 4 agents
mpiexec -n 4 python -m baselines.run_unity --alg=ppo2 --env=./envs/hopper --num_timesteps=1e6 --save_path=./models/hopper_1m_ppo2

# baslines creates 4 agents
python -m baselines.run_unity --alg=ppo2 --env=./envs/hopper --num_timesteps=1e6 --num_env=4 --save_path=./models/hopper_1m_ppo2

# Play: 
python -m baselines.run_unity --alg=ppo2 --env=./envs/hopper-run —num_timesteps=0 --load_path=./models/hopper_1m_ppo2 --play

# Windows training:
# multiagent
python -m baselines.run_multiagent_unity --alg=ppo2 --env="envs\hopper-x16\Unity Environment.exe" --num_timesteps=1e6 --save_path=models\hopper_1m_ppo2

mpiexec -n 4 python -m baselines.run_unity --alg=ppo2 --env="envs\hopper\Unity Environment.exe" --num_timesteps=1e6 --save_path=models\hopper_1m_ppo2

# Windows Play: 
python -m baselines.run_unity --alg=ppo2 --env="envs\hopper-run\Unity Environment.exe" —num_timesteps=0 --load_path=models\hopper_1m_ppo2 --play

python -m baselines.run_unity --alg=ppo2 --env="envs\walker-run\Unity Environment.exe" —num_timesteps=0 --load_path=models\walker_1m_ppo2 --play

acktr

mpiexec -n 4 python -m baselines.run_unity --alg=acktr --env=./envs/walker --num_timesteps=1e6  --save_path=./models/walker_1m_acktr

python -m baselines.run_unity --alg=acktr --env=./envs/walker-run --num_timesteps=0 --load_path=./models/walker_1m_acktr --play

acer

mpiexec -n 4 python -m baselines.run_unity --alg=acer --env=./envs/walker --num_timesteps=1e6  --save_path=./models/walker_1m_acer

a2c

mpiexec -n 4 python -m baselines.run_unity --alg=a2c --env=./envs/walker --num_timesteps=1e6  --save_path=./models/walker_1m_a2c

gail

mpiexec -n 4 python -m baselines.run_unity --alg=gail --env=./envs/walker --num_timesteps=1e6  --save_path=./models/walker_1m_gail

example command lines - not working yet

her

mpiexec -n 4 python -m baselines.run_unity --alg=her --env=./envs/walker --num_timesteps=1e6  --save_path=./models/walker_1m_her

python -m baselines.run_unity --alg=her --env=./envs/walker-run --num_timesteps=0 --load_path=./models/walker_1m_her —-play

ml-agents

MacOS train using marathon_envs_config.yaml

mlagents-learn config/marathon_envs_config.yaml --train --worker-id=10 --env=./envs/hopper-x16 --run-id=hopper.001

set CUDA_VISIBLE_DEVICES=-1 & mlagents-learn config/marathon_envs_config.yaml --train --worker-id=10 --env=./envs/hopper-x16 --run-id=hopper.001

Charts

Win10 MacOS
ml-agents-ppo charts macOS-ml-agents-ppo-score=435-1m-simulation-steps
baselines-ppo2 multiagents charts
baselines-ppo2 multiagents non-normalized charts
baselines-ppo2 MPIx4 charts Win10-4xmpi-openai-baselines-ppo2-score=594-1m-simulation-steps-ep_len macOS-4xmpi-openai-baselines-ppo2-score=583-1m-simulation-steps-ep_len
baselines-ppo2 single agent charts Win10-solo_agent-openai-baselines-ppo2-score=328-1m-simulation-steps-ep_len
baselines-ppo2 MPIx4 Not Normalized* charts macOS-4xmpi-openai-baselines-ppo2-NoNormalize-score=50-1m-simulation-steps
stable_baselines-ppo2 multiagents MacOS-stable_baselines-ppo2-multiagent-score=1
stable_baselines-ppo2 mpi multi agent MacOS-stable_baselines-ppo2-4xmpi-score=2
stable_baselines-ppo2 single agent MacOS-stable_baselines-ppo2-single-agent-score=11
baselines-ppo2 single agent TfRunningMeanStd MacOS-solo_agent-openai-baselines-ppo2-score=107-1m-simulation-steps

Stable.Baselines

Note: Stable Baselines is a fork of OpenAI.Baselines which addresses some issues with OpenAI.Baselines (main one for me is that OpenAI.Baselines cannot save enviroments with normalized observations)

Install stable-baselines

pip install stable-baselines
# trains 16 concurrent agents
python sb_train.py --algo ppo2 --env MarathonWalkerEnv-v0
python sb_train.py --algo ppo2 --env MarathonWalker2DEnv-v0

# loads and runs a trained model
python sb_enjoy.py --algo ppo2 --env MarathonHopperEnv-v0
python sb_enjoy.py --algo ppo2 --env MarathonWalker2DEnv-v0

Status

Win10 MacOS Notes
stable_baselines-ppo2 multiagents score=870 (9min) see python train_multiagent.py

Charts

Win10 MacOS
stable_baselines-ppo2 multiagents MacOS-stable_baselines_ppo2_cpu
stable_baselines-ppo2 mpi multi agent
stable_baselines-ppo2 single agent
baselines-ppo2 single agent TfRunningMeanStd MacOS-solo_agent-openai-baselines-ppo2-score=107-1m-simulation-steps
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].