All Projects → calestini → retrosheet

calestini / retrosheet

Licence: MIT License
Project to parse retrosheet baseball data in python

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to retrosheet

boxball
Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Stars: ✭ 79 (+315.79%)
Mutual labels:  sports, baseball, retrosheet
sports.py
A simple Python package to gather live sports scores
Stars: ✭ 51 (+168.42%)
Mutual labels:  sports, baseball
scrapeOP
A python package for scraping oddsportal.com
Stars: ✭ 99 (+421.05%)
Mutual labels:  sports, baseball
openrowingmonitor
A free and open source performance monitor for rowing machines
Stars: ✭ 29 (+52.63%)
Mutual labels:  sports, sports-analytics
Deep-Neural-Networks-for-Baseball
A repository to follow along with Andrew Trask's "Grokking Deep Learning" by modelling baseball statistics using various architectures of neural networks built from scratch.
Stars: ✭ 15 (-21.05%)
Mutual labels:  sports, baseball
cfbscrapR
A scraping and aggregating package using the CollegeFootballData API
Stars: ✭ 25 (+31.58%)
Mutual labels:  sports, sports-analytics
pybbda
Python Baseball Data and Analysis
Stars: ✭ 21 (+10.53%)
Mutual labels:  baseball, baseball-statistics
NBA-Machine-Learning-Sports-Betting
NBA sports betting using machine learning
Stars: ✭ 150 (+689.47%)
Mutual labels:  sports, sports-analytics
3D-Tracking-MVS
3D position tracking for soccer players with multi-camera videos
Stars: ✭ 68 (+257.89%)
Mutual labels:  sports-analytics
nflfastR
A Set of Functions to Efficiently Scrape NFL Play by Play Data
Stars: ✭ 268 (+1310.53%)
Mutual labels:  sports-analytics
hms-health-demo-java
HMS Health demo code provides demo programs for your reference or usage. Developers can access the Huawei Health Platform and obtain sports & health data by integrating HUAWEI Health.
Stars: ✭ 37 (+94.74%)
Mutual labels:  sports
hms-health-demo-kotlin
HMS Health demo code provides demo programs for your reference or usage. Developers can access the Huawei Health Platform and obtain sports & health data by integrating HUAWEI Health.
Stars: ✭ 21 (+10.53%)
Mutual labels:  sports
nhl-twitter-bot
🚨 Hockey Game Bot is a Python application that sends important NHL events to social media platforms in (near) real time.
Stars: ✭ 18 (-5.26%)
Mutual labels:  sports
baseballstats
Baseball win expectancy and expected runs per inning calculators
Stars: ✭ 23 (+21.05%)
Mutual labels:  baseball
replay-table
A javascript library for visualizing sport season results with interactive standings
Stars: ✭ 67 (+252.63%)
Mutual labels:  sports
sportyR
R package for drawing regulation playing surfaces for several sports
Stars: ✭ 84 (+342.11%)
Mutual labels:  sports-analytics
football-graphs
Graphs and passing networks in football.
Stars: ✭ 81 (+326.32%)
Mutual labels:  sports-analytics
ballpark-tracker
A simple application used for tracking which MLB and AAA stadiums a "Ballpark Chaser" has been to.
Stars: ✭ 15 (-21.05%)
Mutual labels:  baseball
kenpompy
A simple yet comprehensive web scraper for kenpom.com.
Stars: ✭ 41 (+115.79%)
Mutual labels:  sports-analytics
opta sd
OPTA Sports Data Soccer API Client (OPTA SDAPI)
Stars: ✭ 28 (+47.37%)
Mutual labels:  sports

retrosheet

Build Status codecov Python 3.6 License: MIT Version: 0.1.0

A project to parse retrosheet baseball data in python. All data contained at Retrosheet site is copyright © 1996-2003 by Retrosheet. All Rights Reserved.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org"

Motivation

The motivation behind this project is to enhance python-based baseball analytics, from data collection to advanced predictive modeling techniques.


Before you start

If you are looking for a complete solution out of the box, check Chadwick Bureau

If you are looking for a quick way to check stats, see Baseball-Reference

If you want a web-scrapping solution, check pybaseball

Getting Started

Downloading Package

Run the following code to create the folder structure

git clone https://github.com/calestini/retrosheet.git

Downloading historical data to csv

Note: This package is a work in progress, and the files are not yet fully parsed, and statistics not fully validated.

The code below will save data from 1921 to 2017 in your machine. Be careful as it will take some time to download it all (10min with a decent machine and decent internet connection). Final datasets add up to ~ 3GB

from retrosheet import Retrosheet
rs = Retrosheet()
rs.batch_parse(yearFrom=1921, yearTo=2017, batchsize=10) #10 files at a time
[========================================] 100.0% ... Completed 1921-1930
[========================================] 100.0% ... Completed 1931-1940
[========================================] 100.0% ... Completed 1941-1950
[========================================] 100.0% ... Completed 1951-1960
[========================================] 100.0% ... Completed 1961-1970
[========================================] 100.0% ... Completed 1971-1980
[========================================] 100.0% ... Completed 1981-1990
[========================================] 100.0% ... Completed 1991-2000
[========================================] 100.0% ... Completed 2001-2010
[========================================] 100.0% ... Completed 2011-2017

Files it will download / create:

  • plays.csv
  • teams.csv
  • rosters.csv
  • lineup.csv
  • pitching.csv
  • fielding.csv
  • batting.csv
  • running.csv
  • info.csv

Useful Links / References

  • Our own summary of Retrosheet terminology can be found here
  • For the events file, the pitches field sometimes repeats over the following role, whenever there was a play (CS, SB, etc.). In these cases, the code needs to remove the duplication.
  • Main baseball statistics --> here
  • Hit location diagram are here
  • Link to downloads here
  • Glossary of Baseball
  • Information about the event files can be found here
  • Documentation on the datasets can be found here
  • Putouts and Assists rules

Play Field in Event File:

  • What does 'BF' in '1/BF' stand for? bunt fly?
  • Why some specific codes for modifier are 2R / 2RF / 8RM / 8RS / 8RXD / L9Ls / RNT ?

TODO

  • Finish parsing pitches
  • Clean-up code and logic
  • Test primary stats with game logs
  • Test innings ending in 3 outs
  • Playoff files
  • Parks files
  • Player files
  • Create sql export option
  • Aggregate more advanced metrics
  • Map out location
  • Add additional data if possible
  • Load game-log data
  • Load player / manager/ umpire data

Validating Career Stats - Spot Checks

Batting + Fielding

  • Josh Donaldson (player_id = donaj001)
Source R H HR SB
Official 526 860 174 32
ThisPackage 524 853 173 32
  • Nelson Cruz (player_id = cruzn002)
Source R H HR SB
Official 768 1447 317 75
ThisPackage 767 1427 317 75
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].