All Projects → rodmoioliveira → football-graphs

rodmoioliveira / football-graphs

Licence: BSD-3-Clause license
Graphs and passing networks in football.

Programming Languages

HTML
75241 projects
clojure
4091 projects
shell
77523 projects
SCSS
7915 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to football-graphs

football analytics
⚽📊 A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.
Stars: ✭ 405 (+400%)
Mutual labels:  soccer, football-data, sports-analytics
jgrapht
Master repository for the JGraphT project
Stars: ✭ 2,259 (+2688.89%)
Mutual labels:  graphs, jgrapht
Awesome Network Analysis
A curated list of awesome network analysis resources.
Stars: ✭ 2,525 (+3017.28%)
Mutual labels:  network-science, complex-networks
Algorithms
Free hands-on course with the implementation (in Python) and description of several computational, mathematical and statistical algorithms.
Stars: ✭ 117 (+44.44%)
Mutual labels:  graphs, networkx
angular-footballdata-api-factory
AngularJS Factory for the football-data.org JSON REST API
Stars: ✭ 48 (-40.74%)
Mutual labels:  soccer, football-data
regista
An R package for soccer modelling
Stars: ✭ 71 (-12.35%)
Mutual labels:  soccer, sports-analytics
ntds 2018
Material for the EPFL master course "A Network Tour of Data Science", edition 2018.
Stars: ✭ 59 (-27.16%)
Mutual labels:  graphs, network-science
nflfastR
A Set of Functions to Efficiently Scrape NFL Play by Play Data
Stars: ✭ 268 (+230.86%)
Mutual labels:  football-data, sports-analytics
gqlalchemy
GQLAlchemy is a library developed with the purpose of assisting in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder.
Stars: ✭ 39 (-51.85%)
Mutual labels:  graphs, networkx
ai-distillery
Automatically modelling and distilling knowledge within AI. In other words, summarising the AI research firehose.
Stars: ✭ 20 (-75.31%)
Mutual labels:  graphs, network-science
Osmnx
OSMnx: Python for street networks. Retrieve, model, analyze, and visualize street networks and other spatial data from OpenStreetMap.
Stars: ✭ 3,357 (+4044.44%)
Mutual labels:  graphs, networkx
epl mysql db
Free/open English Premier League results database from 1993-2017. Dump format is MySQL and sqlite.
Stars: ✭ 26 (-67.9%)
Mutual labels:  soccer, football-data
transfermarkt-datasets
⚽️ Extract, prepare and publish Transfermarkt datasets.
Stars: ✭ 60 (-25.93%)
Mutual labels:  soccer, football-data
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+2213.58%)
Mutual labels:  network-science, networkx
cfbscrapR
A scraping and aggregating package using the CollegeFootballData API
Stars: ✭ 25 (-69.14%)
Mutual labels:  football-data, sports-analytics
nxontology
NetworkX-based Python library for representing ontologies
Stars: ✭ 45 (-44.44%)
Mutual labels:  graphs, networkx
mcnp
📊复杂网络建模课程设计. The project of modeling of complex networks course.
Stars: ✭ 69 (-14.81%)
Mutual labels:  complex-networks, clustering-coefficient
disparity filter
Implements a disparity filter in Python, based on graphs in NetworkX, to extract the multiscale backbone of a complex weighted network (Serrano, et al., 2009)
Stars: ✭ 17 (-79.01%)
Mutual labels:  graphs, networkx
Stellargraph
StellarGraph - Machine Learning on Graphs
Stars: ✭ 2,235 (+2659.26%)
Mutual labels:  graphs, networkx
ntds 2019
Material for the EPFL master course "A Network Tour of Data Science", edition 2019.
Stars: ✭ 62 (-23.46%)
Mutual labels:  graphs, network-science

xscode

🎯 About this project

Football Passing Networks is an interactive web application to explore data visualizations on soccer passing networks.

🌐 Website

About passing networks

📚 Papers

📰 Articles

📌 Blogs

📂 Resources & Tools

📷 Visualizations

👀 Accounts to follow

💾 Dataset

💻 Development Stack

🔧 Install Dependencies

# install python dependencies
chmod +x install_python.sh
./install_python.sh

# install node dependencies
npm install

# install clojure dependencies
npm run compile-once

Run Project

npm start

🏃 Run Tests

npm test

💽 IO Usage

First of all, download the dataset inside the directory src/main/data/soccer_match_event_dataset like so:

src/main/data/soccer_match_event_dataset
├── competitions.json
├── events_England.json
├── events_European_Championship.json
├── events_France.json
├── events_Germany.json
├── events_Italy.json
├── events_Spain.json
├── events_World_Cup.json
├── matches_England.json
├── matches_European_Championship.json
├── matches_France.json
├── matches_Germany.json
├── matches_Italy.json
├── matches_Spain.json
├── matches_World_Cup.json
├── players.json
└── teams.json

Then, you can generate some data like this:

# just once
chmod +x sh/streamline.sh

# Params
# - championship [England | European_Championship | France | Germany | Italy | Spain | World_Cup]
# - match-id (check out src/main/data/match_ids)

# Use
# ./sh/streamline.sh championship match-id

# get all the data for Italian championship
./sh/streamline.sh Italy

# get data for match 2576338
./sh/streamline.sh Italy 2576338

The streamline.sh will generate six files for each match:

# raw data from the match
src/main/data/matches/italy_genoa_torino,_1_2_2576338.edn
src/main/data/matches/italy_genoa_torino,_1_2_2576338.json

# processed data to create a passing network
src/main/data/graphs/italy_genoa_torino,_1_2_2576338.edn
src/main/data/graphs/italy_genoa_torino,_1_2_2576338.json

# passing network with metrics calculations
src/main/data/analysis/italy_genoa_torino,_1_2_2576338.edn
src/main/data/analysis/italy_genoa_torino,_1_2_2576338.json

Missing matches analysis can be found within missing.edn.

📐 Understanding the Metrics

Using Network Science to Analyse Football Passing Networks: Dynamics, Space, Time, and the Multilayer Nature of the Game

At the topological microscale, the importance of each player has been related to:

  1. its degree, which is the number of passes made by a player (Cotta et al., 2013);
  2. eigenvector centrality, a measure of importance obtained from the eigenvectors of the adjacency matrix (Cotta et al., 2013);
  3. closeness, measuring the minimum number of steps that the ball has to undergo from one player to reach any other in the team (López-Peña and Touchette, 2012);
  4. betweenness centrality, which accounts how many times a given player is necessary for completing the routes (made by the ball) connecting any other two players of its team (Duch et al., 2010; López-Peña and Touchette, 2012).
  5. other metrics, such as the clustering coefficient, which measures the number of “neighbors” of a player that also have passed the ball between them (i.e., the number of triangles around a player), has also been quantified to evaluate the contribution of a given player to the local robustness of the passing network (López-Peña and Touchette, 2012).

Defining a historic football team: Using Network Science to analyze Guardiola’s F.C. Barcelona

But, how is the structure of the average passing networks? And, more importantly, are there differences between FCB and the rest of the teams? Figure 3 shows the comparison of 6 parameters directly related with the topological organization of the average passing networks (see Methods for a detailed description of all these network parameters). In Fig. 3A, we plot the:

  1. clustering coefficient C, which is related to the amount of triangles created between any triplet of players. Clustering coefficient is an indicator of the local robustness of networks31, since when a triangle connecting three nodes (i.e. players) exists, and a link (i.e., pass) between two nodes is lost (i.e., not possible to make the pass), there is an alternative way of reaching the other node passing through the other two edges of the triangle. In football, the clustering coefficient mesures the triangulation between three players. As we can observe in Fig. 3A the value of C is much higher in FCB, which reveals that connections between three players are more abundant than at their rivals.
  2. The average shortest path d is an indicator about how well connected are players inside a team. It measures the “topological distance” that the ball must go through to connect any two players of the team. Since the links of the passing networks are weighted with the number of passes, the topological distance of a given link is defined as the inverse of the number of passes. The higher the number of passes between two players, the closer (i.e., lower) the topological distance between them is. Furthermore, since it is the ball that travels from one player to any other, it is possible to find the shortest path between any pair of players by computing the shortest topological distance between them, no matter if it is a direct connection or if it involves passing through other players of the team. Finally, the average shortest path d of a team is just the average of the shortest path between all pairs of players. As we can observe in Fig. 3B, the shortest path of FCB is much lower than their rivals, which reveals that players are better connected between them. As we will discuss later, note that this fact could be produced by the network organization or just being a consequence of having a higher number of passes, which reduces the overall topological distance of the links and, consequently, the value of d.
  3. Figure 3C shows the comparison between the largest eigenvalue λ1 of the connectivity matrix A (also known as the weighted adjacency matrix), whose elements aij contain the number of passes between players i and j31. The largest eigenvalue has been used as a quantifier of the network strength, since it increases with the number of nodes and links (see Methods). As expected (due to the high number of passes), the largest eigenvalue λ1 of FCB is much higher than the corresponding values of its rivals. This metric reveals the higher robustness of the passing network of Guardiola’s team, which indicates that an eventual loss of passes would have less consequences in F.C. Barcelona than in the rest of the teams.
  4. It is also worth analyzing the behavior of the second smallest eigenvalue λ2 of the Laplacian matrix L, also known as the algebraic connectivity (see Methods). The value of λ2 is related to several network properties. In synchronization, networks with higher λ2 require less time to synchronize54 and in diffusion processes, the time to reach equilibrium also goes with the inverse of λ2. In the context of football passing networks, λ2 can be interpreted as a metric for quantifying the division of a team. The reason is that low values of λ2 indicate that a network is close to be split into two groups, eventually breaking for λ2=0. In this way, the higher the value of λ2 the more interconnected the team is, being a measure of structural cohesion. In Fig. 3D, we have plot the comparison of λ2, which reveals that FCB attacking and defensive lines are more intermingled, leading to a λ~2 higher than its rivals.
  5. Finally, Fig. 3E-F show how centrality (i.e., the importance of the players inside the passing network) is distributed along the team, a metric calculated by means of the eigenvector related to the largest eigenvalue of the connectivity matrix (see Methods). Figure 3E contains the average dispersion of centrality and Fig. 3F shows the highest value of a single player. In both cases, differences are not statistically significant to support evidences of a different centrality distribution between FCB and the rest of the teams.

Exploring Team Passing Networks and PlayerMovement Dynamics in Youth Association Footbal

  1. Closeness centrality - a closeness score indicates how easy it is for a player to be con-nected with teammates (by passing relation) and, therefore, that player is requested by theteam as a target to pass the ball. Thus, it quantifies the proximity of how close is such player tohis peers [40]. Closeness centrality is defined as the inverse of the farness, where higher valuesassume a positive meaning in the node proximity [8,15,39]. It is calculated by computing theshortest path between the node vand all other nodes, and then calculating the summa.
  2. Betweenness centrality - a player with higher betweenness scores is crucial to maintainteam passing connections by acting as a connecting bridge. Also, low scores and spread acrosscertain players may be related with well-balanced passing strategy and less specific players’dependence. Betweenness centrality quantifies the occurrences that a node acts as abridge along the geodesic path between other nodes.

Play-by-Play Network Analysis in Football

  1. Flow Centrality - For each player, flow centrality measures the fraction of plays (or attack units) that it is involved in at least once relative to all plays by its team. Thus, an indication on the overall involvement of all playing positions across a match is provided. By construction, flow centrality values are bounded between 0 and 1. The extreme value of 0 signals that a player was not part of any play in terms of passing or receiving the ball. A value of 1 means that a player was at least involved once in every play of its team during the match. Any flow centrality value in between can be interpreted as the proportion of plays that a player was involved in relative to all plays by its team.
  2. Flow Betweenness - For each player, flow betweenness measures the fraction of plays in which it functions as an intermediary player relative to all plays by its team. We define a player as intermediate in a play if it actually functions as a bridging player in terms of passing between any other two players. In contrast to CFC, which only tracks involvement, CFB considers the actual passing sequence of a play to track whether a player is positioned in between a sequence to function as a bridging unit. Flow betweenness values are also bounded between 0 and 1. Values of 0 signal that a player did not once receive the ball by a teammate and successfully passed it on to another teammate in any play during a match. A value of 1 means that a player received and passed on the ball at least once in every play of its team. Values in between the extreme values are again the proportion of plays that a player functioned in as a bridging unit relative to all plays by its team. While being in-between always implies being involved in a play, the reversal is not true. Initiating or being at the end of a play implies that a player is involved but not in-between a ball possession. Therefore, the flow centrality value of a player in a match is always at least as high as its corresponding flow betweenness value.
  3. Weighted betweenness - assesses how often a player is in-between any other two players of its team measured by their strongest passing connections across a match. Thus, its betweenness character is built on aggregated match data and does not necessarily imply that the player functioned as a bridging unit within plays. It is often used as a playmaker indicator (Pena and Touchette, 2012; Clemente and Martins, 2017). The values of weighted betweenness are bounded between 0 and 1 reflecting the proportion of strongest passing connections between any two players in the network that lead via a particular player.

🎨 Visual Reference

📎 License

BSD 3-Clause License

👤 Author

Rodolfo Mói [LinkedIn] [Twitter]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].