All Projects → public-transport → gtfs-utils

public-transport / gtfs-utils

Licence: ISC license
Utilities to process GTFS data sets.

Programming Languages

javascript
184084 projects - #8 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to gtfs-utils

european-transport-operators
NOT UP-TO-DATE ANYMORE, UNMAINTAINED. CHECK european-transport-feeds INSTEAD. List of european long-distance transport operators, available API endpoints, GTFS feeds and client modules.
Stars: ✭ 47 (+147.37%)
Mutual labels:  transit, gtfs, public-transport
retro-gtfs
Collect real-time transit data and process it into a retroactive GTFS 'schedule' which can be used for routing/analysis
Stars: ✭ 45 (+136.84%)
Mutual labels:  transit, gtfs, public-transport
transitime
TheTransitClock real-time transit information system
Stars: ✭ 60 (+215.79%)
Mutual labels:  transit, gtfs
mapzen-gtfs
Python library for reading and writing GTFS feeds
Stars: ✭ 27 (+42.11%)
Mutual labels:  transit, gtfs
pt2matsim
Package to create a multi-modal MATSim network and schedule from public transit data (GTFS or HAFAS) and an OSM map of the area.
Stars: ✭ 29 (+52.63%)
Mutual labels:  gtfs, public-transport
linked-connections-server
Express based server that exposes Linked Connections.
Stars: ✭ 12 (-36.84%)
Mutual labels:  gtfs, public-transport
concentrate
Concentrate: combine realtime transit files
Stars: ✭ 23 (+21.05%)
Mutual labels:  transit, gtfs
gtfstools
General Transit Feed Specification (GTFS) Editing and Analysing Tools
Stars: ✭ 31 (+63.16%)
Mutual labels:  gtfs, public-transport
transport-apis
machine-readable list of transport API endpoints
Stars: ✭ 32 (+68.42%)
Mutual labels:  transit, public-transport
matsim-sbb-extensions
matsim swiss rail
Stars: ✭ 23 (+21.05%)
Mutual labels:  transit, public-transport
goodservice
Website that detects headway discrepancy on New York City Subway system using live countdown clocks
Stars: ✭ 26 (+36.84%)
Mutual labels:  transit, gtfs
fastgtfs
A pure Rust library that provides GTFS parsing, navigation, time table creation, and real-time network simulation.
Stars: ✭ 21 (+10.53%)
Mutual labels:  gtfs, public-transport
nepomuk
A public transit router for GTFS feeds (currently only static) written in modern c++
Stars: ✭ 22 (+15.79%)
Mutual labels:  transit, gtfs
transxchange2gtfs
tool to convert transxchange data into a GTFS feed
Stars: ✭ 26 (+36.84%)
Mutual labels:  transit, gtfs
raptor
Implementation of the Route Based Public Transit Algorithm (Raptor)
Stars: ✭ 64 (+236.84%)
Mutual labels:  transit, public-transport
public-transit-tools
Tools for working with GTFS public transit data in ArcGIS
Stars: ✭ 126 (+563.16%)
Mutual labels:  transit, gtfs
R5
Routing engine for multimodal (transit/bike/walk/car) networks with a particular focus on public transit.
Stars: ✭ 153 (+705.26%)
Mutual labels:  transit, gtfs
Onebusaway Application Modules
The core OneBusAway application suite.
Stars: ✭ 174 (+815.79%)
Mutual labels:  transit, gtfs
theweekendest
Real-time New York City subway service map
Stars: ✭ 51 (+168.42%)
Mutual labels:  transit, gtfs
transitland-atlas
an open directory of mobility feeds and operators — powers both Transitland v1 and v2
Stars: ✭ 55 (+189.47%)
Mutual labels:  transit, gtfs

gtfs-utils

Utilities to process GTFS data sets.

npm version ISC-licensed minimum Node.js version support me via GitHub Sponsors chat with me on Twitter

  • supports frequencies.txt
  • works in the browser
  • fully asynchronous/streaming

Design goals

streaming/iterative on sorted data

As public transportation systems will hopefully become more integrated over time, GTFS datasets will often be multiple GBs large. GTFS processing should work in memory-constrained Raspberry Pis or FaaS environments as well.

Whenever possible, all gtfs-utils tools will only read as little data into memory as possible. For this, the individual files in a GTFS dataset need to be sorted in a way that allows iterative processing.

Read more in the performance section.

data-source-agnostic

gtfs-utils does not make assumptions about where you read the GTFS data from. Although it has a built-in tool to read CSV from files on disk, anything is possible: .zip archives, HTTP requests, in-memory buffers, dat/IPFS, etc.

There are too many half-done, slightly opinionated GTFS processing tools out there, so gtfs-utils tries to be as universal as possible.

correct

Aside from new features of the ever-expanded GTFS spec that change the expected behavior of old ones (and bugs of course), gtfs-utils tries to follow the spec closely.

For example, it will, when computing the absolute timestamp/instant of an arrival at a stop, always take into account stop_timezone or the user-supplied timezone, because stop_times.txt uses "wall clock time".

Installing

npm install gtfs-utils

Usage

API documentation

sorted GTFS files

gtfs-utils assumes that the files in your GTFS dataset are sorted in a particular way; This allows it to compute some data aggregations more memory-efficiently, which means that you can use it to process very large datasets. For example, if trips.txt and stop_times.txt are both sorted by trip_id, computeStopovers() can read each file incrementally, only rows for one trip_id at a time.

Miller and sponge work very well for this:

mlr --csv sort -f agency_id agency.txt | sponge agency.txt
mlr --csv sort -f parent_station -nr location_type stops.txt | sponge stops.txt
mlr --csv sort -f route_id routes.txt | sponge routes.txt
mlr --csv sort -f trip_id trips.txt | sponge trips.txt
mlr --csv sort -f trip_id -n stop_sequence stop_times.txt | sponge stop_times.txt
mlr --csv sort -f service_id calendar.txt | sponge calendar.txt
mlr --csv sort -f service_id,date calendar_dates.txt | sponge calendar_dates.txt
mlr --csv sort -f trip_id,start_time frequencies.txt | sponge frequencies.txt

There's also a sort.sh script included in the npm package, which executes the commands above.

Note: For read-only sources (like HTTP requests), sorting the files is not an option. You can solve this by spawning mlr and piping data through it.

Note: With a bit of extra code, you can also use gtfs-utils with a .zip archive or with a remote feed.

basic example

Given our sample GTFS dataset, we'll answer the following question: On a specific day, which vehicles of which lines stop at a specific station?

We define a function readFile that reads our GTFS data into a readable stream/async iterable. In this case we'll read CSV files from disk using the built-in readCsv helper:

const readCsv = require('gtfs-utils/read-csv')

const readFile = (file) => {
	return readCsv(require.resolve('sample-gtfs-feed/gtfs/' + file + '.txt'))
}

computerStopovers() will read calendar.txt, calendar_dates.txt, trips.txt, stop_times.txt & frequencies.txt and return all stopovers of all trips across the full time frame of the dataset.

It returns an async generator function (which thus is async-iterable), so we can use for await.

In the following example, we're going to print all stopovers at airport on the 5th of May 2019:

const {DateTime} = require('luxon')
const computeStopovers = require('gtfs-utils/compute-stopovers')

const day = '2019-05-15'
const isOnDay = (t) => {
	const iso = DateTime.fromMillis(t * 1000, {zone: 'Europe/Berlin'}).toISO()
	return String(t).slice(0, day.length) === day
}

const stopovers = await computeStopovers(readFile, 'Europe/Berlin')
for await (const stopover of stopovers) {
	if (stopover.stop_id !== 'airport') continue
	if (!isOnDay(stopover.arrival)) continue
	console.log(stopover)
}
{
	stop_id: 'airport',
	trip_id: 'a-downtown-all-day',
	service_id: 'all-day',
	route_id: 'A',
	start_of_trip: 1557871200,
	arrival: 1557926580,
	departure: 1557926640,
}
{
	stop_id: 'airport',
	trip_id: 'a-outbound-all-day',
	service_id: 'all-day',
	route_id: 'A',
	start_of_trip: 1557871200,
	arrival: 1557933900,
	departure: 1557933960,
}
// …
{
	stop_id: 'airport',
	trip_id: 'c-downtown-all-day',
	service_id: 'all-day',
	route_id: 'C',
	start_of_trip: 1557871200,
	arrival: 1557926820,
	departure: 1557926880,
}

For more examples, check the API documentation.

Performance

By default, gtfs-utils verifies that the input files are sorted correctly. You can disable this to improve performance slightly by running with the CHECK_GTFS_SORTING=false environment variable.

gtfs-utils should be fast enough for small to medium-sized GTFS datasets. It won't be as fast as other GTFS tools because it

On my M1 Macbook Air, with the 180mb 2022-02-03 HVV GTFS dataset (17k stops.txt rows, 91k trips.txt rows, 2m stop_times.txt rows, ~500m stopovers), computeStopovers computes 18k stopovers per second, and finishes in several hours.

Note: If you want a faster way to query and transform GTFS datasets, I suggest you to use gtfs-via-postgres to leverage PostgreSQL's query optimizer. Once you have imported the data, it is usually orders of magnitude faster.

Related

Contributing

If you have a question or have difficulties using gtfs-utils, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].