All Projects → AaronJi → RL

AaronJi / RL

Licence: other
A set of RL experiments. Currently including: (1) the MDP rank experiment, based on policy gradient algorithm

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to RL

Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+13554.55%)
Mutual labels:  policy-gradient
Tianshou
An elegant PyTorch deep reinforcement learning library.
Stars: ✭ 4,109 (+18577.27%)
Mutual labels:  policy-gradient
rpg
Ranking Policy Gradient
Stars: ✭ 22 (+0%)
Mutual labels:  policy-gradient
Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (+463.64%)
Mutual labels:  policy-gradient
Deep Algotrading
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading
Stars: ✭ 173 (+686.36%)
Mutual labels:  policy-gradient
SharkStock
Automate swing trading using deep reinforcement learning. The deep deterministic policy gradient-based neural network model trains to choose an action to sell, buy, or hold the stocks to maximize the gain in asset value. The paper also acknowledges the need for a system that predicts the trend in stock value to work along with the reinforcement …
Stars: ✭ 63 (+186.36%)
Mutual labels:  policy-gradient
Reinforcement learning
강화학습에 대한 기본적인 알고리즘 구현
Stars: ✭ 100 (+354.55%)
Mutual labels:  policy-gradient
TAA-PG
Usage of policy gradient reinforcement learning to solve portfolio optimization problems (Tactical Asset Allocation).
Stars: ✭ 26 (+18.18%)
Mutual labels:  policy-gradient
Multihopkg
Multi-hop knowledge graph reasoning learned via policy gradient with reward shaping and action dropout
Stars: ✭ 202 (+818.18%)
Mutual labels:  policy-gradient
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+909.09%)
Mutual labels:  policy-gradient
Policy Gradient
Minimal Monte Carlo Policy Gradient (REINFORCE) Algorithm Implementation in Keras
Stars: ✭ 135 (+513.64%)
Mutual labels:  policy-gradient
A2c
A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow
Stars: ✭ 169 (+668.18%)
Mutual labels:  policy-gradient
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (+281.82%)
Mutual labels:  policy-gradient
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (+450%)
Mutual labels:  policy-gradient
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (+45.45%)
Mutual labels:  policy-gradient
Torchrl
Highly Modular and Scalable Reinforcement Learning
Stars: ✭ 102 (+363.64%)
Mutual labels:  policy-gradient
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+12913.64%)
Mutual labels:  policy-gradient
LWDRLC
Lightweight deep RL Libraray for continuous control.
Stars: ✭ 14 (-36.36%)
Mutual labels:  policy-gradient
DRL in CV
A course on Deep Reinforcement Learning in Computer Vision. Visit Website:
Stars: ✭ 59 (+168.18%)
Mutual labels:  policy-gradient
siamese dssm
siamese dssm sentence_similarity sentece_similarity_rank tensorflow
Stars: ✭ 59 (+168.18%)
Mutual labels:  ranking-algorithm

RL

LIRD

Replicate the MDP rank algorithm in https://github.com/egipcy/LIRD Some logic is extended including:

  • user features is added

Related paper: [Deep Reinforcement Learning for List-wise Recommendations]

RUN MovieLens example

python python/LIRD/LIRD_main.py movielens_lird_example

MDP rank:

Replicate the MDP rank algorithm in [Reinforcement Learning to Rank with Markov Decision Process. Wei, Xu, Lan, Guo, Cheng, SIGIR’17, 2017]

Related paper: [Adapting Markov Decision Process for Search Result Diversification. Xia, Xu, Lan, Guo, Zeng, Cheng, SIGIR’17, 2017]

Run OHSUMED example

python python python/MDPrank/MDPrank_main.py letor_ohsumed_example

Run TREC example

python python/MDPrank/MDPrank_main.py letor_trec_example 
--training_set Letor/TREC/TD2003/Data/Fold1/trainingset.txt 
--valid_set Letor/TREC/TD2003/Data/Fold1/validationset.txt 
--test_set Letor/TREC/TD2003/Data/Fold1/testset.txt

ADP (adaptive dynamic programming):

Related paper:

  • [An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Fleet Management, I: Single Period Travel Times. Godfrey, Powell, Transportation Science, 2002]
  • [An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Fleet Management, II: Multiperiod Travel Times. Godfrey, Powell, Transportation Science, 2002]

Run time & space scheduling example:

python python/ADPscheduling/ADP_scheduling_main.py time_space_scheduling_example

Example shows 5 resources to be scheduling within a 4x4 rectangular system and 24 time intervals. Only relocations with transfer period (tau) less than 2 time intervals are considered; 30 iterations are executed.

  • Figure 0: Shape of the converged value function and corresponding margin values / derivatives of value function / shadow variables of optimzation problem at number of resource = 0, tau = 0, t = 8, 12, 16, 24
  • Figure 1: The scheduling actions (the arrows, color indicates number of relocated resources) and value of converged value function at the real number of resource, tau = 0, t = 8, 12, 16, 24
  • Figure 2: Detailed result of the 13th location. (1) Result of the CAVE update at t = 20 and iter = 29; (2) Initial values of value functions at iter = 30 and t = 0, 6, 12, 18; (3) Initial values of value functions at t = 20 and iter = 0, 9, 19, 29

For different experiments, the data path in the arguments need to be changed accordingly.

Temporary issue:

Currently the code may not run direclty in Windows, due to some path issues.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].