All Projects β†’ samshipengs β†’ Coordinated-Multi-Agent-Imitation-Learning

samshipengs / Coordinated-Multi-Agent-Imitation-Learning

Licence: other
This is an implementation of the paper "Coordinated Multi Agent Imitation Learning", or the Sloan version "Data-Driven Ghosting using Deep Imitation Learning" using Tensorflow

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Coordinated-Multi-Agent-Imitation-Learning

Awesome Carla
πŸ‘‰ CARLA resources such as tutorial, blog, code and etc https://github.com/carla-simulator/carla
Stars: ✭ 246 (+602.86%)
Mutual labels:  imitation-learning
rpg
Ranking Policy Gradient
Stars: ✭ 22 (-37.14%)
Mutual labels:  imitation-learning
vnla
Code accompanying the CVPR 2019 paper: https://arxiv.org/abs/1812.04155
Stars: ✭ 60 (+71.43%)
Mutual labels:  imitation-learning
evoplex
Evoplex is a fast, robust and extensible platform for developing agent-based models and multi-agent systems on networks. It's available for Windows, Linux and macOS.
Stars: ✭ 98 (+180%)
Mutual labels:  multi-agent
Multi-Commander
Multi & Single Agent Reinforcement Learning for Traffic Signal Control Problem
Stars: ✭ 67 (+91.43%)
Mutual labels:  multi-agent
Fruit-API
A Universal Deep Reinforcement Learning Framework
Stars: ✭ 61 (+74.29%)
Mutual labels:  multi-agent
Awesome Real World Rl
Great resources for making Reinforcement Learning work in Real Life situations. Papers,projects and more.
Stars: ✭ 234 (+568.57%)
Mutual labels:  imitation-learning
Comyco
The implementation of "Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning" (ACM MM 2019)
Stars: ✭ 38 (+8.57%)
Mutual labels:  imitation-learning
Pontryagin-Differentiable-Programming
A unified end-to-end learning and control framework that is able to learn a (neural) control objective function, dynamics equation, control policy, or/and optimal trajectory in a control system.
Stars: ✭ 111 (+217.14%)
Mutual labels:  imitation-learning
GoBigger
Come & try Decision-Intelligence version of "Agar"! Gobigger could also help you with multi-agent decision intelligence study.
Stars: ✭ 410 (+1071.43%)
Mutual labels:  multi-agent
crowddynamics
Continuous-time multi-agent crowd simulation engine implemented in Python using Numba and Numpy for performance.
Stars: ✭ 24 (-31.43%)
Mutual labels:  multi-agent
robotic-warehouse
Multi-Robot Warehouse (RWARE): A multi-agent reinforcement learning environment
Stars: ✭ 62 (+77.14%)
Mutual labels:  multi-agent
robotarium-rendezvous-RSSDOA
This repository contains the Matlab source codes (to use in Robotarium platform) of various rendezvous controllers for consensus control in a multi-agent / multi-robot system.
Stars: ✭ 35 (+0%)
Mutual labels:  multi-agent
Mil
Code for "One-Shot Visual Imitation Learning via Meta-Learning"
Stars: ✭ 254 (+625.71%)
Mutual labels:  imitation-learning
imitation learning
PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.
Stars: ✭ 93 (+165.71%)
Mutual labels:  imitation-learning
Tianshou
An elegant PyTorch deep reinforcement learning library.
Stars: ✭ 4,109 (+11640%)
Mutual labels:  imitation-learning
end2end-self-driving-car
End-to-end Self-driving Car (Behavioral Cloning)
Stars: ✭ 19 (-45.71%)
Mutual labels:  imitation-learning
magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
Stars: ✭ 60 (+71.43%)
Mutual labels:  imitation-learning
dril
Disagreement-Regularized Imitation Learning
Stars: ✭ 25 (-28.57%)
Mutual labels:  imitation-learning
haxball-chameleon
Solving Haxball (www.haxball.com) using Imitation Learning methods.
Stars: ✭ 20 (-42.86%)
Mutual labels:  imitation-learning

Coordinated-Multi-Agent-Imitation-Learning

Toronto Raptors had created a ghosting system that would help coaching staff to analyze defend plays better. The game is recorded by camera system above the arena, staff would mark the position of they player where they thought the player should have been and this is the ghost of the player. However, this involves a lot of mannual annotations. In the coordinated multi-agent imitation learning, a data driven method was proposed. (For more details of the Raptors' ghosting system see Lights, Cameras, Revolution).

So in this repo we attempt to implement the paper Coordinated-Multi-Agent-Imitation-Learning (or Sloan version) with Tensorflow.

Introduction

We are aiming to predict the movements or trajectories of defending players for a given team (in principle, we should also be able to create model that predicts offense trajectoy, but defending players were used for both the original ghosting work and also this paper. I assume the reason is that defending trajecotory is slightly easier to predict than offending).

In order to predict the trajectory, we need to roll out a sequence of prediction for the player's next action. The natural candidate to perform such task is Recurrent Neural Networks (LSTM more specifically), and the input data to the model will be a sequence of (x,y) coordinates of each players (both defendinging team and opponent).

The end result we would like to achieve is that, for a given game play suitation where team A is on defense, we can show what would another team B do, who presumably is the best defending team in the league. This is slightly different compaing to the original ghosting work done by Raptors. Instead of focusing on specifically what a player should do based on a coach experience, this work is modeling what another team would do in same suitation (again, in principle we could also model each specific player but that shall require a much larger data set and that is a more complicated task for the model to learn).

Data

The update-to-date data is proprietary, but we found a tracking and play-by-play data for 42 Toronto Raptors games played in Fall 2015 on this link. We will use this data for our implementation. See the link for a detailed description of the data.

Below is a short preview of the data for game with id 0021500463:

end_time_left home moments orig_events playbyplay quarter start_time_left visitor
0 702.31 {'abbreviation': 'CHI', 'players': [{'playerid... [[1, 1451351428029, 708.28, 12.78, None, [[-1,... [0] GAME_ID EVENTNUM EVENTMSGTYPE EVENTMS... 1 708.28 {'abbreviation': 'TOR', 'players': [{'playerid...
1 686.28 {'abbreviation': 'CHI', 'players': [{'playerid... [[1, 1451351428029, 708.28, 12.78, None, [[-1,... [1] GAME_ID EVENTNUM EVENTMSGTYPE EVENTMS... 1 708.28 {'abbreviation': 'TOR', 'players': [{'playerid...
2 668.42 {'abbreviation': 'CHI', 'players': [{'playerid... [[1, 1451351444029, 692.25, 12.21, None, [[-1,... [2, 3] GAME_ID EVENTNUM EVENTMSGTYPE EVENTMS... 1 692.25 {'abbreviation': 'TOR', 'players': [{'playerid...

The main columns we use for building the model is moments, quarter, home and visitor. Moments contain the most information such as basketball location, all players locations and their team ID and player ID. Quarter is used in both input features and preprocessing. Home and visitor basically specifies the team name and ID which can be usful when validating the preprocessed data.

Pre-processing

Not all the moments from the data set is used. Each event is supposed to describe a game play precisely but the given moments often contain frames that would not help the model. For examples, there are frames only ontain 8 or 9 players, or the basketball is out of bound, this is not allowed as the model expects a fixed input dimension. Many moments have frames that are not critical to decision making, e.g. dribbling before entering the half court, clocks being stopped etc. Shot clock sometimes has null value.

ALso to make it easier for the model to learn, we perform some extra preprocessings. Such as, only model defending players and normalize the court to just half court, the reason is that the game swaps court after half-time which could confuse the model and game plays involvs whole court is more dynamic in nature so that it's harder to predict.

We list out each pre-processing details in the following:

  1. Remove frames that do not contain 10 players and 1 basketball, and chunk the following frames as another event (same applies for any chunking in the subsequent processings).
    You can find the function named remove_non_eleven does this in preprocessing.py.
    This prevents players or basketball out of boundary.
  2. Chunk the moments from shotclock.
    chunk_halfcourt does this in preprocessing.py
    If the shotclock turns to 24 (shot clock reached) or 0 (resets, e.g. rebound or turnover), or shot clock is None or stopped, we remove them from the moments. Since the behavior of players differs dramatically at these time points.
  3. Chunk moments to just half-court.
    chunk_halfcourt in preprocessing.py
    Remove all moments that are not contained within a half-court and change the x coordinates to be between 0 and 47 (NBA court is 50x94 feet).
  4. Reorder data reorder_teams in preprocessing.py
    Reorder the matrix in moments s.t. the first five players data are always from defending player.

Originally we would like to use the play-by-play data to do the data processing but it turns out the play-by-play data itself is not accurate. For example, In game 0021500196, event 2, 'time_left': [705, 704, 685, 684]}, 'event_str': ['miss', 'rebound', 'miss', 'rebound'],

For 685.0 the shot clock is at 21.77, which at the time the shot was already missed for a while and the defending team got rebound and was already switching to offense. The event miss should have been marked right after 24s shot clock reset. This is resonable to human eyes but would certain affect the model learning.

Features

  1. Besides Cartesian coordiantes for basketball and all the players from the data, we also add Polar coodinates.
  2. The distance of each players to the ball and hoop in polar coordiantes.
  3. Add velocities for both players and basketball (in Cartesian coordinates).

You can check out the details in create_static_features and create_dynamic_features functions form features.py.

Below is an example plot of a game event,

Blue is the defending team, red is the opponent and the green one is the basketball. The arrow indicates the velocity vector for each player. The black circle is the hoop. The smaller the dot is the earlier player is in the sequence

Hidden Structure Learning

Finally we will get into how we want to build the model. It may seem like how we want to approach this i.e. feed the input sequence of data into a LSTM where the label for each current time step is the input of the next time step. However, there are two major issues:

  1. Since we are training on input data that contains multiple agents, we need to consider the order of the input.
  2. A standard one-to-one or many-to-one would not have practical use since in real game we would like to have predictions for next at least several time steps instead of just one prediction at a time.

In this section we mainly talk about the first issue. The input data point at each time step looks like,

we are supposed to feed into data that has consistent order to the model, otherwise the model is going to have a hard time to learn anything. This is known as "index free" multi-agent system. How do we define the order then? by their height, weight or their assigned roles e.g. Power-forward or Point-guard? Using the pre-defined roles sounds more reasonable but they may change during the actual game play. So instead of using fixed roles, the team of this paper suggested to learn the hidden states/roles for each players.

Here we will make use of the hmmlearn library (pomegranate looks like a good option too). We train a Hidden Markov model which would predict the hidden state for each time step, this is done by using Baum–Welch algorithm from which we can know the emission probabilities for each hidden roles.

Naturally we do not need to bother with the emission distribution, Viterbi algorithm would help us to find the most likely sequence of hidden roles. However since we are trying to assign hidden roles to each player then it is possible that different players get assigned the same hidden role (indeed it happened when I run Viterbi to get the sequence of assigned roles). More concretely, for each player at each time step we assign a hidden role:

Notice that how player 1 and 2 both get assigned to hidden role 1 for initial time step, and player 2 and 5 get assigned to the same hidden role 3. We cannot have this assignment as we will need the hidden role to order the players, so instead of having the hard assignment for each player we employ linear assignment techniques, more specifically Hungarian algorithm to assign the hidden role.

We do so by first compute the Euclidean distance (you can also try cosine similarity) from each player's data point at certain timestep to the center of each hidden roles distribution, which we assumed to be (mixture) multivariavte Gaussain. Then we use this as the cost matrix and apply Hungarian algorithm.

try to create a vis for the hidden state

Imitation Learning

We are hoping the model can learn or mimic the trajectory by training on players tracking data. Naturally we make use of LSTM for this task. One common example of the LSTM architecture is to take a sequence of length T of state S and outputs the action for each next time step.

however, the first obvious issue is that in real game we do not have the sequence of player states (unlike in a machine translation problem where you have the complete sentence ready), which are exactly the values we are trying to predict for. If we have these values then we do not to predict them anyway. So simply we do not have the input for a sequence of inputs.

What one could do is to train the model based on available data, use the predicted output of current time step as the next time step input during run time, that is instead of using true value as next time step input we use the output from previous time step.

This is doable and looks okay but in run time the model will get baffled by the drifting or compound error. As the prediction goes on for longer time steps, the prediction error gets larger and larger to the point where the prediction would be really far off from the realistic trajectories. This happens although the loss value is small in training time.

We demonstate this through a simple experiment. Below is a sine signla being added Gaussian noise with mean=2 and standard deviation=1.

First we apply regular lstm that uses ground truth as the real input for every time step, the prediction result looks pretty _good_,

but this is deceptive because in real settings we need to predict multiple steps ahead instead of relying ground truth. So if we take the trained model and simply make predictions based on previous result, the prediction quickly converges to the mean of the Gaussian noise,

So the paper proposed to let the model see for longer time steps and experience this drifting error during train time. We first start training the regular lstm model where each time step input is ground truth. Then we extend the horizon where the input uses i.e. during training time we use the current time step output as the next step input. We increase the horzion by 1 and repeat. This gives model the experience of handling drifting error in train time, which leads to better performance in real run time setting.

for the sine wave example, the test result becomes much better when we gradually increase the horizon from 0 to 6,

To illustrate this using network connections:

Step 1 Step 2 Step 3
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].