Quantifying MBTA Delays

This co-op cycle I’m working at MIT Lincoln Lab. While I do really enjoy it there and have learned a lot over the past few months, the commute sucks. Since I live near Northeastern campus and take the MBTA in to work, it takes me about an hour and a half each way - assuming the T is running on time. Unfortunately it rarely ever does, and this post is about a project I came up with to quantify all that lost time using a Raspberry Pi, Google’s Distance Matrix API, my phone, and D3.js.

mbta_logo

The Inspiration

It seems like every year that I participate in the CCDC I end up with another Raspberry Pi that I don’t know what to do with. That, coupled with the fact that I’ve had nearly 3 hours every day on my commute to come up with ideas for it, lead to this project. Developed entirely while riding the T, I present my personal commute delay tracker.

raspberry_pi

Technical Details

The Back End

At the heart of this project is a REST-like application written in python with Flask. This app maintains two endpoints:

  1. /data - Exposes json commute data via a GET request
  2. /update - Allows data update via an authenticated POST request with json data in the following format.
{
  "key": "",      # A secret key used for authentication
  "action": "",   # either "depart" or "arrive"
  "location": "", # latitude and longitude
  "time": ""      # current time
}

The first endpoint is relatively simple:

@app.route('/data', methods=['GET'])
@cross_origin()
def get_data():
    f = open('/path/to/data.json', 'r')
    data = json.load(f)
    f.close()

    return jsonify({'results': data})

Note: We have to allow CORS requests here since the data is fetched and rendered client side in the front end.

The second endpoint required a bit more plumbing. First, for security reasons, I wanted to validate some sort of shared key so that the filesystem of my Raspberry Pi isn’t just exposed to the world.

if data['key'] != SECRET: raise Exception()

Next, we need to normalize the time format.

now = parse(time_string)
        epoch = datetime.datetime.utcfromtimestamp(0)
        time = int((now-epoch).total_seconds())

Finally if action is “depart” then we need to save the current location and time for later use. If it’s “arrive”, then we should calculate the time it should have taken via Google Maps for various transportation modes and save it off into data.json.

action = data['action']
location = data['location']
time_string = data['time']

now = parse(time_string)
epoch = datetime.datetime.utcfromtimestamp(0)
time = int((now-epoch).total_seconds())

if action == 'depart':
    save_depart(location, time)
elif action == 'arrive':
    depart = get_depart()
    arrive = { 'location': location, 'time': time }
    results = get_expected_time(depart, arrive)
    write_data(results)

The key function here is get_expected_time()

def get_expected_time(depart, arrive):
    results = {}

    for mode in modes:
        endpoint = 'https://maps.googleapis.com/maps/api/distancematrix/json'
        options = {
            'key': GDISTANCEMATRIX_KEY,
            'origins': depart['location'],
            'destinations': arrive['location'],
            'mode': mode,
            'departure_time': depart['time'] + 7*24*60*60,
            'units': 'imperial',
        }

        url = '{}?{}'.format(endpoint, urllib.urlencode(options))

        result = json.loads(urllib.urlopen(url).read())
        travel_time = result['rows'][0]['elements'][0]['duration']['value']

        results[mode] = (int(travel_time)+int(depart['time']))*1000

    results['depart'] = int(depart['time']*1000)
    results['arrive'] = int(arrive['time']*1000)
    return results

If you were looking closely, you may have noticed that I fetch transportation time one week in the future for all predictions. This is because Google’s Distance Matrix API won’t actually let you fetch timing data for travel that occurs in the past. This design decision was made for two reasons. First, to ensure flexibility - this way it doesn’t actually matter where I start and end my trip and I don’t need to provide the application with my destination when I depart. Second, to keep the T honest. Google maps is actually very good and I’ve found that it will sometimes account for some significant delays that I would want to hold the T accountable for. Fetching data for the future ensures we get a time that is representative of how long the T should take. The full source (with a few changes) is at the bottom of this post.

I should also mention that this required some minor configuration of my home router.

Logging Data

Because I’m lazy and don’t feel like writing an android app, I use an IFTTT Maker Chanel to send REST requests to my Raspberry Pi server from my phone. The IFTTT DO app even has nice little home screen widgets you can use register to an individual DO button. It’s as simple as pressing the “depart” button when I get on the T and the “arrive” button when I get off.

home_screen

The Front End

The real reason I collected all this data was so that I could write a cool D3 visualization on top of it. The website is hosted on github pages and you can view it’s source here. I won’t bore you with the details, but I’m a fan of D3 for quick and easy visualizations.

Source Code

#!/usr/bin/python

from flask import Flask, request, jsonify
from flask.ext.cors import cross_origin
import requests, urllib, json, datetime
from dateutil.parser import parse

app = Flask(__name__, static_url_path='')

SECRET = 'XXX'              # Fill this in with your own shared secret key
GDISTANCEMATRIX_KEY = 'XXX' # Fill this in with a valid distancematrix key

modes = [
    'transit',
    'driving',
    'bicycling',
    'walking',
]

def get_expected_time(depart, arrive):
    results = {}

    for mode in modes:
        endpoint = 'https://maps.googleapis.com/maps/api/distancematrix/json'
        options = {
            'key': GDISTANCEMATRIX_KEY,
            'origins': depart['location'],
            'destinations': arrive['location'],
            'mode': mode,
            # We have to add 1 week to the departure time because the
            # distancematrix API won't let you specify times in the past for
            # driving, walking, or bicycling directions.
            'departure_time': depart['time'] + 7*24*60*60,
            'units': 'imperial',
        }

        url = '{}?{}'.format(endpoint, urllib.urlencode(options))

        result = json.loads(urllib.urlopen(url).read())
        travel_time = result['rows'][0]['elements'][0]['duration']['value']

        results[mode] = (int(travel_time)+int(depart['time']))*1000

    results['depart'] = int(depart['time']*1000)
    results['arrive'] = int(arrive['time']*1000)
    return results

def save_depart(location, time):
    f = open('/home/pi/server/depart.json', 'r')
    data = { "location": location, "time": time }
    f.close()

    f = open('/home/pi/server/depart.json', 'w')
    f.write(json.dumps(data))
    f.close()

def get_depart():
    f = open('/home/pi/server/depart.json', 'r')
    depart = json.load(f)
    f.close()

    return depart

def write_data(results):
    f = open('/home/pi/server/data.json', 'r')
    data = json.load(f)
    f.close()

    data.append(results)

    f = open('/home/pi/server/data.json', 'w')
    data_string = json.dumps(data, sort_keys=True, indent=4, separators=(',', ': '))
    f.write(data_string)
    f.close()

@app.route('/update', methods=['POST'])
def update():
    try:
        data = request.json

        if data['key'] != SECRET: raise Exception()
        action = data['action']
        location = data['location']
        time_string = data['time']

        now = parse(time_string)
        epoch = datetime.datetime.utcfromtimestamp(0)
        time = int((now-epoch).total_seconds())

        if action == 'depart':
            save_depart(location, time)
        elif action == 'arrive':
            depart = get_depart()
            arrive = { 'location': location, 'time': time }
            results = get_expected_time(depart, arrive)
            write_data(results)
        else:
            raise Exception()

        return jsonify({'response':'logged successfully'})
    except Exception as e:
        print(e)
        return jsonify({'response':'invalid request'})

@app.route('/data', methods=['GET'])
@cross_origin()
def get_data():
    f = open('/home/pi/server/data.json', 'r')
    data = json.load(f)
    f.close()

    return jsonify({'results': data})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port='8888')

Disclaimer: I am not affiliated with the MBTA in any way. This data is a reflection of my personal experience - gathered using precise location data and Google Maps predictions for various transportation modes.