Since this blog will dedicate many posts to General Transit Feed Specification - GTFS - let me provide an introduction.
Firstly, from the official documentation:
The GTFS transit feed specification defines a common format for public transportation schedules and associated geographic information.
The “G” in GTFS is commonly thought of as standing for Google, since they created this specification so they can display transit info on Google Maps.
Transit agencies distribute (well, provide a link on their web site typically) to a ZIP file that contains a number of CSV (comma-separated values) files - each of which is described at the above URL.
Agencies distribute updates to their feeds at their own intervals. For instance, Auckland provide one month of data at a time, while others may provide 6 or 12 months worth of data.
The GTFS format isn’t overly strict, meaning:
- There’s no fixed format for identifiers (such as route_id or trip_id)
- Agencies tend to omit fields as they please
- Agencies sometimes add fields that aren’t in the spec (this doesn’t really matter - you can choose to use the extra info provided, or simply have your CSV parser ignore it)
- Many agencies tend to misuse the calendar.txt and calendar_dates.txt files
There’s no versioning included in GTFS. Each agency has adopted GTFS at different stages of its lifetime, so they build their files. For instance, when Perth adopted GTFS, the direction_id field in trips.txt wasn’t part of the spec. For a given route, “inbound” and “outbound” of the same route are treated as separated entries in the routes.txt file.
In later posts I’m going to discuss some of the various differences and challenges associated with the feeds from the feeds of various cities.
Hopefully this brief introduction gives you some idea of GTFS. It’s a somewhat simple specification, although some aspects are hard to comprehend until you’ve dealt with the data for a while.