Making GTFS Shape Data Manageable

One of the files that may be included by transit agencies in GTFS feeds is the shapes.txt file. This file is used to describe the various paths that bus or trains (or any route type for that mater) take on their trips.

Every record in shapes.txt corresponds to a single point for a single shape. Some shapes may consist of thousands of points. While it’s awesome that some agencies provide this level of detail, some devices may struggle to represent this data efficiently.

One of the challenges I faced with TransitTimes was the memory constraints of iOS devices. Rendering a polyline with 2000 points is just not feasible using iOS MKMapView.

The following image shows a portion of a shape from Transperth’s GTFS feed. This shape is made up of 400 points (the entire shape is actually 1700 points - I cut this down for simplicity).

As you might imagine, an iPhone 3G struggled immensely trying to draw this map, and once drawn it was almost impossible to actually scroll around the map. (In actual fact, the iPhone 4 also struggled).

This level of detail has no benefit to the user - especially if they can’t interact with their device.

The following image shows the exact same shape, but this time it’s made up of 70 points. This is far more manageable for mobile devices.

To automate this reduction of points, I used the Douglas-Peucker algorithm. This is a recursive algorithm that finds the point furthest away orthogonally from two other points.

The following image demonstrates this.

If this point is within a given tolerance, the point is discarded. If it’s outside the tolerance, the point is kept and the algorithm continues. When it continues, the algorithm is called twice: once for the line connecting the original start point and the new point, and again, this time for the line connecting the new point and the original end point.

I’ve written an article on PhpRiot about implementing this algorithm in PHP: Reducing a Map Path Using Douglas-Peucker Algorithm.

Using this algorithm, I’ve managed to remove upwards of 95% of shape points from GTFS files I’ve processed, without any significant loss of quality.

The other point I haven’t touched on: removing more than 95% of shape data also results in significant savings of space. For instance, the shapes.txt file for Transperth is over 40 MB in size. Reducing this file by 95% would make the file about 2 MB.

Comments are closed.