All Projects → hiwonjoon → NEC

hiwonjoon / NEC

Licence: other
Implementation of Neural Episodic Control in Tensorflow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to NEC

react-native-dnd-board
A drag and drop Kanban board for React Native.
Stars: ✭ 41 (+57.69%)
Mutual labels:  dnd
TheMiniIndex
Crowd-sourced library of 3d models (minis, terrain, scatter, etc.) for D&D, Pathfinder, and other tabletop games.
Stars: ✭ 17 (-34.62%)
Mutual labels:  dnd
Dungeoneer
A game master helper tool, includes a virtual tabletop, initiative tracker, combat tracker and homebrew management for Dungeons and Dragons 5e.
Stars: ✭ 106 (+307.69%)
Mutual labels:  dnd
DicePP
基于nonebot和go-cqhttp的骰子机器人
Stars: ✭ 17 (-34.62%)
Mutual labels:  dnd
vue3-smooth-dnd
Vue3 wrapper components for smooth-dnd
Stars: ✭ 92 (+253.85%)
Mutual labels:  dnd
slate-react-dnd-plugin
No description or website provided.
Stars: ✭ 35 (+34.62%)
Mutual labels:  dnd
Planarally
A companion tool for when you travel into the planes.
Stars: ✭ 242 (+830.77%)
Mutual labels:  dnd
city-of-doors
The Map of Sigil, City of Doors
Stars: ✭ 40 (+53.85%)
Mutual labels:  dnd
dndstats
Statistics of DnD characters submitted to https://oganm.com/shiny/printSheetApp and https://oganm.com/shiny/interactiveSheet. A larger dataset is at https://github.com/oganm/dnddata
Stars: ✭ 37 (+42.31%)
Mutual labels:  dnd
PolyDiceGenerator
A customizable Polyhedral Dice Generator for OpenSCAD.
Stars: ✭ 63 (+142.31%)
Mutual labels:  dnd
character-overlay
Web App for adding an OBS overlay with character information such as name, picture, and health for your favorite role-playing game.
Stars: ✭ 17 (-34.62%)
Mutual labels:  dnd
Project-GW
A TTRPG conversion of the Guild Wars Games.
Stars: ✭ 23 (-11.54%)
Mutual labels:  dnd
awesome-5e-srd
A compilation of all the cool things people make with the D&D 5e SRD API 🔮✨
Stars: ✭ 20 (-23.08%)
Mutual labels:  dnd
this-is-your-life
An angular app character backstories based on the Xanathar's Guide to Everything 'This Is Your Life' tables.
Stars: ✭ 36 (+38.46%)
Mutual labels:  dnd
vue2-dragula-demo
Vue2 demo app for vue-dragula plugin
Stars: ✭ 77 (+196.15%)
Mutual labels:  dnd
dnddata
Weekly updated dataset of D&D characters submitted to https://oganm.com/shiny/printSheetApp and https://oganm.com/shiny/interactiveSheet. A superset of characters used in oganm/dndstats
Stars: ✭ 91 (+250%)
Mutual labels:  dnd
dnd
Beautiful and accessible drag and drop for lists with React.
Stars: ✭ 271 (+942.31%)
Mutual labels:  dnd
react-dnd-treeview
A draggable / droppable React-based treeview component. You can use render props to create each node freely.
Stars: ✭ 207 (+696.15%)
Mutual labels:  dnd
dflex
The sophisticated Drag and Drop library you've been waiting for 🥳
Stars: ✭ 806 (+3000%)
Mutual labels:  dnd
withinbot
Within's Discord Bot
Stars: ✭ 18 (-30.77%)
Mutual labels:  dnd

Neural Episodic Control

Tensorflow implementation of Neural Episodic Control (Pritzel et al.).

⚠️ This is not an official implementation, and might have some glitch (,or a major defect).

Basic Setup

  1. Please install basic Deep RL packages with your environment management system.

This code is tested under the environment having those packages:

python3.6+
tensorflow == 1.13.1
gym['atari'] == 0.12.0
numpy, opencv, etc.
  1. Clone this repo
git clone [email protected]:hiwonjoon/NEC.git --recursive
  1. Install pyflann approximated nearest neighbor algorithm.
cd libs/flann
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=<path you want to install flann & pyflann library> ..
make -j8
make install

Training

python q_learning.py --log_dir <path to logdir> --env_id <env id such as 'BreakoutNoFrameskip-v4'>

For more hyperparameters, please check out q_learning.py.

Note that the checkpoint file generated during training can be huge since the memory has to be preserved as a part of model. In default, the checkpoint will be generated every 1 million timesteps.

Evaluation

python q_learning.py --log_dir <path to logdir> --env_id <env id such as 'BreakoutNoFrameskip-v4'> --mode eval

You can specify specific model with the --model_file option; for example, policy after 5M timesteps can be loaded by providing --model_file model.ckpt-5000000 option.

Implementation Difference

The paper doesn't reveal few hyperparameters, such as epsilon decaying strategy or learning rate alpha. Such parameters are just picked using my hunch, and with the help of the other NEC repo. Along with unrevealed hyperparameters, I also used different hyperparameters: delta of 1e-5 instead of reported 1e-3 is used, the network and DND is update every 32 frames instead of 16, and replay buffer size is set to 30000 instaed of 100000.

One of the feature that is not implemented is replacing a memory when it find a exact match from a dictionary. However, I am unsure whether it will really happen since both embedding network and the saved embedding in the dictionary keep changing, so it is very unlikely to get a exact match even the exact same state is visited multiple times.

Sample Result

  • Pong

    • Trained policy after 5 million frames of training

pong_after_5M_frames

  • Average Episode Reward: 21 (vs about 15 @ 5 million frames, 20.4 @ 10 million frames from the original NEC paper)

  • Training Statistics

pong_training

  • Breakout

    • Trained policy after 5 million frames of training

breakout_after_5M_frames

  • Average Episode Reward: 27.5 @ 5M, 133.5 @ 10M (vs 13.6 @ 10 million frames from the original NEC paper)

  • Training Statistics

pong_training

  • Hero

    • Trained policy after 5 million frames of training

hero_after_5M_frames

  • Average Episode Reward: 13395 @ 5M and saturated. (vs about 13000 @ 5M, 16265.3 @ 10M)

  • MsPacman

    • Trained policy after 5 million frames of training

MsPacman_after_5M_frames

  • Average Episode Reward: 1800 @ 5M, 2118 @ 10M (vs about 2800 @ 5M, 4142.8 @ 10M)

  • Alien

    • Trained policy after 5 million frames of training

Alien_after_5M_frames

  • Average Episode Reward: 490 @ 5M and 800 @ 10M (vs about 2200 @ 5M, 3460.6 @ 10M)

  • Frostbite

    • Trained policy after 5 million frames of training

Frostbite_after_5M_frames

  • Average Episode Reward: 260 @ 5M and saturated. (vs about 1500 @ 5M, 2747.4 @ 10M)

What is the big deal here?

IMHO, the coolest part of NEC is its straightforwardness; for example, it does not requrie reward scaling scheme (most of the other RL altorithms clip reward -1 to 1 in order to stabilize value function learning). It is basically cool continous extension of classical Tabular Q-learning with deep learning.

Enjoy! 🍺

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].