All Projects → SkillCorner → Opendata

SkillCorner / Opendata

Licence: mit
SkillCorner Open Data with 9 matches of broadcast tracking data.

Projects that are alternatives of or similar to Opendata

Ditras
DITRAS (DIary-based TRAjectory Simulator), a mathematical model to simulate human mobility
Stars: ✭ 19 (-77.91%)
Mutual labels:  datascience
Football Cli
⚽ Command line interface for Hackers who love football
Stars: ✭ 984 (+1044.19%)
Mutual labels:  soccer
Knyfe
knyfe is a python utility for rapid exploration of datasets.
Stars: ✭ 54 (-37.21%)
Mutual labels:  datascience
Kubeflow Data Science On Steroids
The blog post about Kubeflow, including all materials
Stars: ✭ 25 (-70.93%)
Mutual labels:  datascience
Skater
Python Library for Model Interpretation/Explanations
Stars: ✭ 973 (+1031.4%)
Mutual labels:  datascience
Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+9223.26%)
Mutual labels:  datascience
Talks
Repository of publicly available talks by Leon Eyrich Jessen, PhD. Talks cover Data Science and R in the context of research
Stars: ✭ 16 (-81.4%)
Mutual labels:  datascience
Wrighteaglebase
WrightEagle Base Code for RoboCup Soccer Simulation 2D
Stars: ✭ 73 (-15.12%)
Mutual labels:  soccer
Octopod
Train multi-task image, text, or ensemble (image + text) models
Stars: ✭ 36 (-58.14%)
Mutual labels:  datascience
Soccerapi
soccerapi ⚽️ , an unambitious soccer odds scraper
Stars: ✭ 52 (-39.53%)
Mutual labels:  soccer
Clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Stars: ✭ 887 (+931.4%)
Mutual labels:  datascience
Commons
⛲️ Commons Marketplace client & server to explore, download, and publish open data sets in the Ocean Protocol Network.
Stars: ✭ 34 (-60.47%)
Mutual labels:  datascience
R Community Explorer
Data-Driven Exploration of the R Community
Stars: ✭ 43 (-50%)
Mutual labels:  datascience
Datofutbol
Dato Fútbol repository
Stars: ✭ 23 (-73.26%)
Mutual labels:  soccer
Fifa Fut Data
Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
Stars: ✭ 55 (-36.05%)
Mutual labels:  soccer
Open Data
Free football data from StatsBomb
Stars: ✭ 891 (+936.05%)
Mutual labels:  soccer
Soccergraphr
Soccer Analytics in R using OPTA data
Stars: ✭ 42 (-51.16%)
Mutual labels:  soccer
Ds Cheatsheets
List of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+10890.7%)
Mutual labels:  datascience
Ggstatsplot
Enhancing `ggplot2` plots with statistical analysis 📊🎨📣
Stars: ✭ 1,121 (+1203.49%)
Mutual labels:  datascience
Epl Fantasy Geek
English Premier League 2017-18 Fantasy Stats for Geeks
Stars: ✭ 50 (-41.86%)
Mutual labels:  soccer

SkillCorner Open Data

About this repo

Description

This repo contains 9 matches of broadcast tracking data collected by SkillCorner.

The matches included are the 2019/2020 league matches between the champions and runners up in English Premier League, French L1, Spanish LaLiga, Italian Serie A and German Bundesliga.

Broadcast tracking data is tracking data collected through computer vision and machine learning out of the broadcast video.

To find out more about broadcast tracking data and its use cases, read this Medium article.

Motivation

This data has been open sourced in a joint initiative between SkillCorner and Friends Of Tracking. The goals are multiple:

  • Provide access to tracking data to researchers and the sports analytics community.
  • Increase awareness on the existence of broadcast tracking data, and how it can be of benefit to clubs, media and the betting industry.
  • Allow SkillCorner prospects to access data easily, parse our data format and get started building on top of it.

Thus, if you use the data, we kindly ask that you credit SkillCorner and hope you'll notify us on Twitter so we can follow the great work being done with this data.

Documentation

Data Structure

The data directory contains:

  • matches.json file with basic information about the match. Using this file, pick the id of your match of interest.
  • matches folder with one folder for each match (named with its id).

For each match, there is two files:

  • match_data.json contains lineup information, referee, pitch size...
  • structured_data.json contains the tracking data (the players, the main referee and the ball).

Tracking Data Description

The tracking data is a list. Each element of the list is the result of the tracking for a frame, it's a dictionary with keys:

  • period: 1 or 2.
  • frame: the frame of the video the data comes from at 10 fps.
  • timestamp: the timestamp in the match time with a precision of 1/10s.
  • data: the tracking data found at this frame. It's a list.
  • possession: dict with keys trackable_object and group which indicates which team/player possess the ball

Each element of the data list is an "object" (referee, ball or player) found at this frame. It's a dictionary with keys:

  • group_name: one of "home team", "away team", "home goalkeeper", "away goalkeeper", "referee", "ball"
  • trackable_object: unique identifier of an object (to be matched with a player or a referee or the ball in match_data.json file)
  • track_id: when we track an "object" on consecutives frames, it is attributed a track_id. This can be used for further smoothing, speed or acceleration calculation.
  • x: x coordinate of the object
  • y: y coordinate of the object
  • z: z coordinate of the object (only for the ball)

Note that trackable_object is included when the player has been identified with a high degree of certainty. group_name is not included in this case. Otherwise, only group_name is included.

For the spatial coordinates, the unit of the field modelization is the meter, the center of the coordinates is at the center of the pitch.

The x axis is the long side and the y axis in the short side.

Here is an illustration for a field of size 105mx68m. Field modelization for a pitch of size 105x68

Limitation

The data has been processed as SkillCorner produced matches from over 20 leagues (more than 8000 matches this season). The data has been collected automatically from the broadcast and has not received any manual correction. What it means for user:

  • The data is limited to what is visible on the broadcast video. Not all the players are visible (thus included in the data) all the time. The broadcast show an average of 14 players out of 22 at each frame. During replays or close up views, the data is not included.
  • Some data points are erroneous. Around 95% of the player identity we provide are accurate. For the missing 5%, the identity of the player may be missing (we only provide the group_name), or the identity can be provided, but wrong.
  • Some speed or acceleration smoothing and control should be applied to the raw data.

Future works

  • We intend to open source some tooling to help people get started with our data.
  • We are not an event data provider ourself, though we intend to provide some tools to synchronize tracking and event data, that you'll be able to use if you can access event data.

Contact us

  • If you have some feedback, some project research that you want to conduct with our data, reach us on our website or on Twitter
  • If you're interested in our product and want more commercial information contact us on our website
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].