All Projects → camminady → LeTourDataSet

camminady / LeTourDataSet

Licence: MIT license
Every cyclist and stage of the Tour de France in two CSV files.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to LeTourDataSet

Loan-Approval-Prediction
Loan Application Data Analysis
Stars: ✭ 61 (+0%)
Mutual labels:  data-analysis
Moose
MOOSE - Platform for software and data analysis.
Stars: ✭ 110 (+80.33%)
Mutual labels:  data-analysis
online-course-recommendation-system
Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.
Stars: ✭ 31 (-49.18%)
Mutual labels:  data-analysis
Infinite Stories with Data
This repo consists of my analysis of random datasets using various statistical and visualization techniques.
Stars: ✭ 21 (-65.57%)
Mutual labels:  data-analysis
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (+96.72%)
Mutual labels:  data-analysis
iMOKA
interactive Multi Objective K-mer Analysis
Stars: ✭ 19 (-68.85%)
Mutual labels:  data-analysis
ttbbeer
An R Dataset Package for US Beer Statistics From TTB 🍺
Stars: ✭ 23 (-62.3%)
Mutual labels:  data-analysis
Fraud-Detection-in-Online-Transactions
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting
Stars: ✭ 41 (-32.79%)
Mutual labels:  data-analysis
ospi
Open Source Presence Infographic of Indian Startups
Stars: ✭ 25 (-59.02%)
Mutual labels:  data-analysis
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (-42.62%)
Mutual labels:  data-analysis
tutorials
Short programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-77.05%)
Mutual labels:  data-analysis
Chapter-2
Code examples for Chapter 2 of Data Wrangling with JavaScript
Stars: ✭ 16 (-73.77%)
Mutual labels:  data-analysis
advanced-pandas
Pandas is a powerful tool for data exploration and analysis (including timeseries).
Stars: ✭ 22 (-63.93%)
Mutual labels:  data-analysis
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+1281.97%)
Mutual labels:  data-analysis
RepSeP
Reproducible Self-Publishing - Demo Publications in the Most Common Formats
Stars: ✭ 14 (-77.05%)
Mutual labels:  data-analysis
nflfastR
A Set of Functions to Efficiently Scrape NFL Play by Play Data
Stars: ✭ 268 (+339.34%)
Mutual labels:  sports-analytics
mixedvines
Python package for canonical vine copula trees with mixed continuous and discrete marginals
Stars: ✭ 36 (-40.98%)
Mutual labels:  data-analysis
tianchi-diabetes
天池精准医疗大赛——人工智能辅助糖尿病遗传风险预测 第一赛季
Stars: ✭ 20 (-67.21%)
Mutual labels:  data-analysis
computational-neuroscience
Short undergraduate course taught at University of Pennsylvania on computational and theoretical neuroscience. Provides an introduction to programming in MATLAB, single-neuron models, ion channel models, basic neural networks, and neural decoding.
Stars: ✭ 36 (-40.98%)
Mutual labels:  data-analysis
elucidate
convenience functions to help researchers elucidate patterns in their data
Stars: ✭ 26 (-57.38%)
Mutual labels:  data-analysis

LeTourDataSet

Distance and winner average pace

TL;DR

If you use pandas, just get the data via:

import pandas as pd 
df = pd.read_csv("https://raw.githubusercontent.com/camminady/LeTourDataSet/master/data/TDF_Riders_History.csv")

If you use R instead of python, you can run:

library(readr)
df <- read_csv("https://raw.githubusercontent.com/camminady/LeTourDataSet/master/data/TDF_Riders_History.csv")

Disclaimer

For issues with this data set, see the Issues tab. There are some entries that are incorrect. However, so far it seems that the mistake stems from wrong data on the letour.fr website. Looking back, I should have probably scraped another website.

Data

Every cyclist of the Tour de France in a single CSV file, stored in the file data/TDF_Riders_History.csv. There's also data on every stage in data/TDF_Stages_History.csv.

How to run

To regenerate the data/TDF_Riders_History.csv file, execute all cells of the src/main.py. This might take a couple of minutes.

Analysis

The src/analysis.py contains some basic analysis and visualizations of the data. For example, the distance and winner pace are shown above.

Legacy code

This code has been completely rewritten. The previous code, including the output, is in the legacy repository. Especially legacy/README.txt should be read.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].