All Projects → remix → partridge

remix / partridge

Licence: MIT License
A fast, forgiving GTFS reader built on pandas DataFrames

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to partridge

Coinsta
A Python package for acquiring both historical and current data of cryptocurrencies
Stars: ✭ 47 (-59.13%)
Mutual labels:  pandas
reciprocalspaceship
Tools for exploring reciprocal space
Stars: ✭ 17 (-85.22%)
Mutual labels:  pandas
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (-75.65%)
Mutual labels:  pandas
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (-80%)
Mutual labels:  pandas
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-31.3%)
Mutual labels:  pandas
transit model
Managing transit data with Rust
Stars: ✭ 33 (-71.3%)
Mutual labels:  gtfs
pandas-msgpack
Pandas Msgpack
Stars: ✭ 22 (-80.87%)
Mutual labels:  pandas
Data-Science-Tutorials
Python Tutorials for Data Science
Stars: ✭ 104 (-9.57%)
Mutual labels:  pandas
hdfe
No description or website provided.
Stars: ✭ 22 (-80.87%)
Mutual labels:  pandas
Interactive-Data-Visualization-with-Python
Present your data as an effective and compelling story
Stars: ✭ 71 (-38.26%)
Mutual labels:  pandas
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (-32.17%)
Mutual labels:  pandas
excel-to-python-course
Student materials and handouts for Excel to Python course
Stars: ✭ 73 (-36.52%)
Mutual labels:  pandas
Algorithmic-Trading
I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repository contains the code of what I have learnt on the way. It starts form some basic simple statistics and will lead up to complex machine learning algorithms.
Stars: ✭ 47 (-59.13%)
Mutual labels:  pandas
pixie
Instant Kubernetes-Native Application Observability
Stars: ✭ 3,238 (+2715.65%)
Mutual labels:  pandas
spreadsheets-to-dataframes
Pycon 2021 Tutorial to help Spreadsheet (Excel) Users learn Python
Stars: ✭ 30 (-73.91%)
Mutual labels:  pandas
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-73.91%)
Mutual labels:  pandas
streamlit-pandas-profiling
Pandas profiling component for Streamlit.
Stars: ✭ 135 (+17.39%)
Mutual labels:  pandas
raccoon
Python DataFrame with fast insert and appends
Stars: ✭ 64 (-44.35%)
Mutual labels:  pandas
ESA
Easy SimAuto (ESA): An easy-to-use Power System Analysis Automation Environment atop PowerWorld Simulator Automation Server (SimAuto)
Stars: ✭ 26 (-77.39%)
Mutual labels:  pandas
carry
Python ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+0%)
Mutual labels:  pandas

Partridge

Partridge is a Python 3.6+ library for working with GTFS feeds using pandas DataFrames.

Partridge is heavily influenced by our experience at Remix analyzing and debugging every GTFS feed we could find.

At the core of Partridge is a dependency graph rooted at trips.txt. Disconnected data is pruned away according to this graph when reading the contents of a feed.

Feeds can also be filtered to create a view specific to your needs. It's most common to filter a feed down to specific dates (service_id) or routes (route_id), but any field can be filtered.

dependency graph

Philosophy

The design of Partridge is guided by the following principles:

As much as possible

  • Favor speed
  • Allow for extension
  • Succeed lazily on expensive paths
  • Fail eagerly on inexpensive paths

As little as possible

  • Do anything other than efficiently read GTFS files into DataFrames
  • Take an opinion on the GTFS spec

Installation

pip install partridge

GeoPandas support

pip install partridge[full]

Usage

Setup

import partridge as ptg

inpath = 'path/to/caltrain-2017-07-24/'

Examples

The following is a collection of gists containing Jupyter notebooks with transformations to GTFS feeds that may be useful for intake into software applications.

Inspecting the calendar

The date with the most trips

date, service_ids = ptg.read_busiest_date(inpath)
#  datetime.date(2017, 7, 17), frozenset({'CT-17JUL-Combo-Weekday-01'})

The week with the most trips

service_ids_by_date = ptg.read_busiest_week(inpath)
#  {datetime.date(2017, 7, 17): frozenset({'CT-17JUL-Combo-Weekday-01'}),
#   datetime.date(2017, 7, 18): frozenset({'CT-17JUL-Combo-Weekday-01'}),
#   datetime.date(2017, 7, 19): frozenset({'CT-17JUL-Combo-Weekday-01'}),
#   datetime.date(2017, 7, 20): frozenset({'CT-17JUL-Combo-Weekday-01'}),
#   datetime.date(2017, 7, 21): frozenset({'CT-17JUL-Combo-Weekday-01'}),
#   datetime.date(2017, 7, 22): frozenset({'CT-17JUL-Caltrain-Saturday-03'}),
#   datetime.date(2017, 7, 23): frozenset({'CT-17JUL-Caltrain-Sunday-01'})}

Dates with active service

service_ids_by_date = ptg.read_service_ids_by_date(path)

date, service_ids = min(service_ids_by_date.items())
#  datetime.date(2017, 7, 15), frozenset({'CT-17JUL-Caltrain-Saturday-03'})

date, service_ids = max(service_ids_by_date.items())
#  datetime.date(2019, 7, 20), frozenset({'CT-17JUL-Caltrain-Saturday-03'})

Dates with identical service

dates_by_service_ids = ptg.read_dates_by_service_ids(inpath)

busiest_date, busiest_service = ptg.read_busiest_date(inpath)
dates = dates_by_service_ids[busiest_service]

min(dates), max(dates)
#  datetime.date(2017, 7, 17), datetime.date(2019, 7, 19)

Reading a feed

_date, service_ids = ptg.read_busiest_date(inpath)

view = {
    'trips.txt': {'service_id': service_ids},
    'stops.txt': {'stop_name': 'Gilroy Caltrain'},
}

feed = ptg.load_feed(path, view)

Read shapes and stops as GeoDataFrames

service_ids = ptg.read_busiest_date(inpath)[1]
view = {'trips.txt': {'service_id': service_ids}}

feed = ptg.load_geo_feed(path, view)

feed.shapes.head()
#       shape_id                                           geometry
#  0  cal_gil_sf  LINESTRING (-121.5661454200744 37.003512297983...
#  1  cal_sf_gil  LINESTRING (-122.3944115638733 37.776439059278...
#  2   cal_sf_sj  LINESTRING (-122.3944115638733 37.776439059278...
#  3  cal_sf_tam  LINESTRING (-122.3944115638733 37.776439059278...
#  4   cal_sj_sf  LINESTRING (-121.9031703472137 37.330157067882...

minlon, minlat, maxlon, maxlat = feed.stops.total_bounds
#  -122.412076, 37.003485, -121.566088, 37.77639

Extracting a new feed

outpath = 'gtfs-slim.zip'

service_ids = ptg.read_busiest_date(inpath)[1]
view = {'trips.txt': {'service_id': service_ids}}

ptg.extract_feed(inpath, outpath, view)
feed = ptg.load_feed(outpath)

assert service_ids == set(feed.trips.service_id)

Features

  • Surprisingly fast :)
  • Load only what you need into memory
  • Built-in support for resolving service dates
  • Easily extended to support fields and files outside the official spec (TODO: document this)
  • Handle nested folders and bad data in zips
  • Predictable type conversions

Thank You

I hope you find this library useful. If you have suggestions for improving Partridge, please open an issue on GitHub.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].