All Projects → OptMLGroup → DeepBeerInventory-RL

OptMLGroup / DeepBeerInventory-RL

Licence: BSD-3-Clause License
The code for the SRDQN algorithm to train an agent for the beer game problem

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to DeepBeerInventory-RL

Rlenv.directory
Explore and find reinforcement learning environments in a list of 150+ open source environments.
Stars: ✭ 79 (+192.59%)
Mutual labels:  deep-reinforcement-learning, rl
Learning To Communicate Pytorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Stars: ✭ 236 (+774.07%)
Mutual labels:  deep-reinforcement-learning, rl
Exploration By Disagreement
[ICML 2019] TensorFlow Code for Self-Supervised Exploration via Disagreement
Stars: ✭ 99 (+266.67%)
Mutual labels:  deep-reinforcement-learning, rl
Mushroom Rl
Python library for Reinforcement Learning.
Stars: ✭ 442 (+1537.04%)
Mutual labels:  deep-reinforcement-learning, rl
tpprl
Code and data for "Deep Reinforcement Learning of Marked Temporal Point Processes", NeurIPS 2018
Stars: ✭ 68 (+151.85%)
Mutual labels:  paper, deep-reinforcement-learning
Noreward Rl
[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning
Stars: ✭ 1,176 (+4255.56%)
Mutual labels:  deep-reinforcement-learning, rl
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (+762.96%)
Mutual labels:  deep-reinforcement-learning, rl
Aws Robomaker Sample Application Deepracer
Use AWS RoboMaker and demonstrate running a simulation which trains a reinforcement learning (RL) model to drive a car around a track
Stars: ✭ 105 (+288.89%)
Mutual labels:  deep-reinforcement-learning, rl
Papers
Summaries of machine learning papers
Stars: ✭ 2,362 (+8648.15%)
Mutual labels:  paper, deep-reinforcement-learning
Drl4recsys
Courses on Deep Reinforcement Learning (DRL) and DRL papers for recommender systems
Stars: ✭ 196 (+625.93%)
Mutual labels:  paper, deep-reinforcement-learning
Drq
DrQ: Data regularized Q
Stars: ✭ 268 (+892.59%)
Mutual labels:  deep-reinforcement-learning, rl
Pytorch-PCGrad
Pytorch reimplementation for "Gradient Surgery for Multi-Task Learning"
Stars: ✭ 179 (+562.96%)
Mutual labels:  deep-reinforcement-learning, rl
Rad
RAD: Reinforcement Learning with Augmented Data
Stars: ✭ 268 (+892.59%)
Mutual labels:  deep-reinforcement-learning, rl
Muzero General
MuZero
Stars: ✭ 1,187 (+4296.3%)
Mutual labels:  deep-reinforcement-learning, rl
Gym Gazebo2
gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo
Stars: ✭ 257 (+851.85%)
Mutual labels:  deep-reinforcement-learning, rl
Awesome System For Machine Learning
A curated list of research in machine learning system. I also summarize some papers if I think they are really interesting.
Stars: ✭ 1,185 (+4288.89%)
Mutual labels:  paper, deep-reinforcement-learning
revisiting rainbow
Revisiting Rainbow
Stars: ✭ 71 (+162.96%)
Mutual labels:  deep-reinforcement-learning, rl
Inventory-Management-System-Django
A Inventory management system written in DJango
Stars: ✭ 71 (+162.96%)
Mutual labels:  inventory-management, inventory-control
gemnet pytorch
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)
Stars: ✭ 80 (+196.3%)
Mutual labels:  paper
my-bookshelf
Collection of books/papers that I've read/I'm going to read/I would remember that they exist/It is unlikely that I'll read/I'll never read.
Stars: ✭ 49 (+81.48%)
Mutual labels:  paper

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

The code of the paper A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization is presented at this repository. The paper is available online in https://pubsonline.informs.org/doi/abs/10.1287/msom.2020.0939. The code works with Python2.7 and Python3.4-Python3.7. For more information see the list of the requirments (You can install them pip install -r requirements.txt). The main.py is the file to call to start the training. BGAgent.py provides the beer-game agent which involves all the properties and functionality of an agent. clBeergame.py instanciates the agents and runs the beer-game simulation. Also, once the number of observations in the replay buffer filled by the minimum requirement, it calls the train-step of the SRDQN algorithm. The DNN approximator and SRDQN algorithm are implemented in SRDQN.py. config.py introduce all arguments and their default values, as well as some functions to properly build the simulation scenarios for different instances of the game. In the following the procedure to run the training and setting different values for the arguments is described.

###Play beer-game and compare your result with AI! You can play the beer-game and compare your result on the same game with the result that our RL algorithm achieves. See https://beergame.opexanalytics.com/

Note that this code does not work with TensorFlow 2+.

Some Notations

Each agent can use either of the srdqn, bs, Ster, or Rnd algorithms to decide about the action (order quantity). So, there are 256 combination of agent-types from which we consider 23 cases in this study. To determine each of these cases, we have used config.gameConfig to select one of pre-defined type of four agents in the game. For example, config.gameConfig=3, sets config.agentTypes = ["srdqn", "bs","bs","bs"], in which the retailer follows the srdqn algorithm and the rest of agents use the base-stock policy to decide for the order quantity. The main gameConfig are as below:

Base-stock co-players

if config.gameConfig == 3: 
	config.agentTypes = ["srdqn", "bs","bs","bs"]
if config.gameConfig == 4: 
	config.agentTypes = ["bs", "srdqn","bs","bs"]
if config.gameConfig == 5: 
	config.agentTypes = ["bs", "bs","srdqn","bs"]
if config.gameConfig == 6: 
	config.agentTypes = ["bs", "bs","bs","srdqn"]

Sterman co-players

if config.gameConfig == 7: 
	config.agentTypes = ["srdqn", "Strm","Strm","Strm"]
if config.gameConfig == 8: 
	config.agentTypes = ["Strm", "srdqn","Strm","Strm"]
if config.gameConfig == 9: 
	config.agentTypes = ["Strm", "Strm","srdqn","Strm"]
if config.gameConfig == 10: 
	config.agentTypes = ["Strm", "Strm","Strm","srdqn"]

Random co-players

if config.gameConfig == 11: 
	config.agentTypes = ["srdqn", "rnd","rnd","rnd"]
if config.gameConfig == 12: 
	config.agentTypes = ["rnd", "srdqn","rnd","rnd"]
if config.gameConfig == 13: 
	config.agentTypes = ["rnd", "rnd","srdqn","rnd"]
if config.gameConfig == 14: 
	config.agentTypes = ["rnd", "rnd","rnd","srdqn"]

The full list of all gameConfig is defined in setAgentType() function in config.py.

Since the d+x rule is used to train the SRDQN model, we use the upper and lower limit for x. config.actionLow and config.actionUp are used to set these values.

In addition, for each agent one can determine the lead time for receving order as well as receving the shimpement via config.leadRecItem1, config.leadRecItem2, config.leadRecItem3, config.leadRecItem4 and config.leadRecOrder1, config.leadRecOrder2, config.leadRecOrder3, config.leadRecOrder4 for four agents. Similarly, the initial inventory level, initial arriving order, and initial arriving shipment can be set by config.ILInit1, config.ILInit2, config.ILInit3, config.ILInit4, config.AOInit1, config.AOInit2, config.AOInit3, config.AOInit4, config.ASInit1, config.ASInit2, config.ASInit3, config.ASInit4, respectively for the four agents.

config.maxEpisodesTrain determines the number of episodes to train the srdqn agent.

TO run the baseStock policy (bs), you need to set the value of the base-stock level for each agent by config.f1, config.f2, config.f3, config.f4. We obtained those values by running the Clark-Scarf algorithm for each instance.

unzip the data

data.zip includes all the required dataset to train the model on basic case, literature cases, basket dataset, and forecasting dataset. Unzipping this file creates data directory, in which there is a python file (createDemand.py) as well as the mentioned datasets. createDemand.py can be used to create datasets of any size for the literature cases.

Train the basic model

The basic model used the Uniform distribution U[0,2] with action space of {-2, -1, 0, 1, 2}. All the default values are set to run this experiment for the case that srdqn plays the retailer and other agents follow base-stock policy. For any other case the training can be started by setting the corresponding arguments. For example, to train a srdqn Warehouse with the initial inventory of 10 units which plays with Sterman co-players, the following line can be used to run the training for 50000 episodes:

python main.py --gameConfig=8 --maxEpisodesTrain=50000 config.ILInit2=10 --batchSize=128

Train the literature cases

To train each of the literature cases, first you need to set config.demandDistribution, actionUp, and actionLow, as well as the other parameter for the agents as following:

For U[0,8]:

python main.py --demandDistribution=0 --demandUp=9  --actionUp=8  --actionLow=-8 --ch1=0.5 --ch2=0.5 --ch3=0.5 --ch4=0.5 --cp1=1.0 --cp2=1.0 --cp3=1.0 --cp4=1.0 --f1=19.0 --f2=20.0 --f3=20.0 --f4=14.0  --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=12 --ILInit2=12 --ILInit3=12 --ILInit4=12 --AOInit1=4 --AOInit2=4 --AOInit3=4 --AOInit4=4 --ASInit1=4 --ASInit2=4 --ASInit3=4 --ASInit4=4 --gameConfig=6 

For N(10,2):

python main.py --demandDistribution=1 --demandMu=10  --demandSigma=2 --actionUp=5  --actionLow=-5 --ch1=1 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0 --cp3=0 --cp4=0 --f1=48.0 --f2=43.0 --f3=41.0 --f4=30.0 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=10 --ILInit2=10 --ILInit3=10 --ILInit4=10 --AOInit1=10 --AOInit2=10 --AOInit3=10 --AOInit4=10 --ASInit1=10 --ASInit2=10 --ASInit3=10 --ASInit4=10 --gameConfig=6

For C(4,8):

python main.py --demandDistribution=2 --actionUp=8  --actionLow=-8 --ch1=0.5 --ch2=0.5 --ch3=0.5 --ch4=0.5 --cp1=1.0 --cp2=1.0 --cp3=1.0 --cp4=1.0 --demandUp=9 --f1=32.0 --f2=32.0 --f3=32.0 --f4=24.0 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --ILInit1=12 --ILInit2=12 --ILInit3=12 --ILInit4=12 --AOInit1=4 --AOInit2=4 --AOInit3=4 --AOInit4=4 --ASInit1=4 --ASInit2=4 --ASInit3=4 --ASInit4=4 --gameConfig=6

Train the basket dataset

For the basket dataset you need to set config.demandDistribution=3, and then config.data_id can be either 6, 13, or 22. For training with the scaled dataset, which is reported in the paper, config.scaled=True is required too. See the following commands for three cases:

python main.py --demandDistribution=3 --data_id=6 --demandMu=3 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=19.0 --f2=12.0 --f3=12.0 --f4=8.0 --ILInit1=3 --ILInit2=3 --ILInit3=3 --ILInit4=3 --AOInit1=3 --AOInit2=3 --AOInit3=3 --AOInit4=3 --ASInit1=3 --ASInit2=3 --ASInit3=3 --ASInit4=3

python main.py --demandDistribution=3 --data_id=13 --demandMu=3  --demandSigma=2  --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=19.0 --f2=13.0 --f3=11.0 --f4=8.0 --ILInit1=3  --ILInit2=3  --ILInit3=3  --ILInit4=3  --AOInit1=3  --AOInit2=3  --AOInit3=3  --AOInit4=3  --ASInit1=3  --ASInit2=3  --ASInit3=3  --ASInit4=3 

python main.py --demandDistribution=3 --data_id=22 --demandMu=2  --demandSigma=2  --demandUp=3 --actionUp=5 --actionLow=-5       --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=14.0 --f2=9.0 --f3=9.0 --f4=5.0 --ILInit1=2  --ILInit2=2  --ILInit3=2  --ILInit4=2  --AOInit1=2  --AOInit2=2  --AOInit3=2  --AOInit4=2  --ASInit1=2  --ASInit2=2  --ASInit3=2  --ASInit4=2 

Train the forecasting dataset

For the forecasting dataset you need to set config.demandDistribution=4, and then config.data_id can be either 5, 34, or 46. For training with the scaled dataset, which is reported in the paper, config.scaled=True is required too. See the following commands for three cases:

python main.py --demandDistribution=4 --data_id=5 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=21.0 --f2=16.0 --f3=16.0 --f4=11.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

python main.py --demandDistribution=4 --data_id=34 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=18.0 --f2=15.0 --f3=14.0 --f4=10.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

python main.py --demandDistribution=4 --data_id=46 --demandMu=4 --demandSigma=2 --demandUp=3 --actionUp=5 --actionLow=-5 --leadRecItem1=2 --leadRecItem2=2 --leadRecItem3=2 --leadRecItem4=2 --leadRecOrder1=2 --leadRecOrder2=2 --leadRecOrder3=2 --leadRecOrder4=1 --scaled=True --ch1=1.0 --ch2=0.75 --ch3=0.5 --ch4=0.25 --cp1=10.0 --cp2=0.0 --cp3=0.0 --cp4=0.0 --f1=21.0 --f2=16.0 --f3=18.0 --f4=12.0 --ILInit1=4  --ILInit2=4  --ILInit3=4  --ILInit4=4  --AOInit1=4  --AOInit2=4  --AOInit3=4  --AOInit4=4  --ASInit1=4  --ASInit2=4  --ASInit3=4  --ASInit4=4 

Use Transfer Learning

We have provided the trained model of the basic model which are used in the transfer learning section. The saved models are available in pre_model\uniform\0-3\brainX in which X is in {3, 4, 5, 6}. The value of X follows the same pattern as of config.gameConfig. To train a new with either of these trained models, you need to set config.tlBaseBrain that determines which trained should be used as the base model. For example:

python main.py --gameConfig=3  --iftl=True --ifUsePreviousModel=True  --tlBaseBrain=3 --baseDemandDistribution=0

Besides, if you trained a model with another demand distribution, e.g., N(10,2), you need to move the saved models into pre_model\normal\10-2\brainX and then for a new training set config.baseDemandDistribution=1. The config.baseDemandDistribution follows the same pattern as of config.demandDistribution.

Other utilities

If you set config.ifSaveFigure=True, it saves the trajectories of inventory-level, reward, action, open-order, and order-upto-level for each agent in an episode. config.saveFigIntLow and config.saveFigIntUp determine the range of eprisode to save the figures.

Setting config.ifsaveHistInterval=True, activate saving of trajectory of the received order, received shipment, inventory-level, reward, action, open-order, and order-upto-level for each agent in an episode. With this argument, you need to determine the interval between every two epsiode to save the history with config.saveHistInterval.

Paper citation

If you used this code for your experiments or found it helpful, consider citing the following paper:

@article{oroojlooyjadid2017deep,
title={A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization},
author={Oroojlooyjadid, Afshin and Nazari, MohammadReza and Snyder, Lawrence and Tak{\'a}{\v{c}}, Martin},
journal = {Manufacturing \& Service Operations Management},
volume = {0},
number = {0},
pages = {null},
year = {0},
doi = {10.1287/msom.2020.0939},

URL = { 
	https://doi.org/10.1287/msom.2020.0939

},
eprint = { 
	https://doi.org/10.1287/msom.2020.0939

}
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].