Central to the Beats Per Mile application is the run data, the live information reporting Gemma’s current location, elapsed time, covered distance, average pace and speed.
Working out how to capture this information was our first challenge. Determining whether it was even possible was pivotal as to whether we could build the site.
Before deciding on the eventual solution of using RunKeeper, we looked for a few different routes we could take, of varying levels of complexity.
Firstly we considered building our own hardware, the basic idea being a GSM/GPS module for about £50 along with a SIM card with unlimited text messages. The module is then programmed to send it’s location to Twitter via SMS every minute or so.
There are a lot of posts around with extensive details on how to build this kind of thing yourself, here’s one doing exactly that. He’s also considering selling pre-built units, which would save us needing to get our soldering irons out.
With this option, whilst the geo-positioning data would be accurate, you can run pretty far in a minute so the route would be a little raw and wouldn’t accurately reflect the true path and wouldn’t look great on a map.
One step further would be to use something like Open GPS Tracker, which is similar but rather than using a combined GSM/GPS module just uses a GPS module plugged directly into a mobile phone.
This particular unit comes pre-built and with firmware, so perhaps might be limited in what we could capture, but otherwise ready to roll straight out of the box and only requires an old (pre-paid) handset. It’s also small in size, easy enough to strap onto a runner or put in a pocket. This comes in around £50ish for parts again, plus $35.91 for the software.
There are commercially manufactured units available too, trackers for mountain climbers, skiers, pilots, sailors and such that aren’t products of home brew electronics.
The SPOT Personal Tracker is one option, a satellite GPS tracker with built-in location services including sending your current location to Google Maps in near real-time, specifically for sharing with others at home. Very reliable and robust, tiny and light, the location services lend to exactly what we want to do but it’s very over-spec. It’s able to work beyond cellular coverage and above 15,000ft — not particularly necessary for us — and comes in at €149 for the unit and a service charge of €99 for a years use.
It does however come with a particularly impressive distress button, notifying the GEOS International Emergency Response Center in case of emergencies should one press it. Presumably this calls in an airlift or team of S.A.R. St. Bernards, which could come in handy when she hits the wall.
Satellite GPS is more immediately available though, in the form of an iPhone.
Initially we considered developing an app from scratch, but that’s not really anything any of us have done before. We could make an online application running on Safari with the Geolocation API, but this would mean having a page open for the entire run and hoping that it doesn’t time out or anything gets pressed on-screen as arms start swinging.
On the app store and found InstaMapper, a lo-fi and free app which sends location updates to the InstaMapper site which are available via a public API. It’s compatible on iPhone, Android and any Java-capable phone, it’s well supported and tested — a far more viable solution than anything we could knock up in the limited time we have. It also works with the phone locked, conserving battery life. The only cost is the data plan, which Gemma already has.
Then came the revelation of RunKeeper Elite, our eventual winner. At the time, Gemma had only just started using RunKeeper Pro — the Pro and Elite labels refer to your subscription status, Elite is a premium service for $4.99 — and we hadn’t really looked into what was available with the upgrade.
RunKeeper does all the things you’d expect from a standard GPS-enabled running app (or Garmin device) — track time, distance, pace, calories burned etc, over a geolocated route visualised on a map, stored with all your previous runs.
The Elite service offers a few training programs, alerts and reports and as of March last year added a ‘Live’ service that pushes your running data to their website as you run, rather than only uploading a report once it’s complete (as the Pro version does). It’s exactly the same data, but made publicly available in real time.
Googling around there are a few sites that have worked out ways to grab the data. For example, Firehole has developed an interface to capture and save activity, there’s a ton of info in the right-hand column of his page. There are plenty of ways to hack and scrape, but it’s a bit dirty and not particularly relevant to us, but shows the data retrieval is possible nonetheless.
These seem to concern the syndication of complete activities, publishing a list of recent runs as one would their latest tweets, or calculating the total number of miles run per month, say. But we’re only interested in the data for one run, a single activity.
Rather than regularly monitoring a profile for updates, we only have to load the data that populates a single RunKeeper page.
This is handy, because rather than scraping and traversing a profile to find what we’re looking for (from rendered HTML) we can load the single data file directly, a JSON file hosted on RunKeeper, then work with the data in exactly the same way as they do.
Now, this is all a bit naughty. According the the RunKeeper Terms of Service, we are strictly not allowed to load the data from their site in this way, even with the permission of the account holder. Any form of scraping at all, is prohibited.
This is a big problem. So what do we do?
Well Len Hardy of Firehole openly links to his API interface in the forum thread mentioned before, pointing out that you can scrape the page data, that he does so and that he’ll freely share the code. RunKeeper haven’t responded to his comment specifically or reminded him of their TOS, but have commented later on the same thread.
I think it’s highly unlikely that RunKeeper could be unaware of these goings on, these scrapers are easily found if you search for them and have been around for a while without being shut down.
I’m rather hoping that they actually don’t mind too much, right now. They must be aware of the demand, they have an API in development. Perhaps when that’s published they’ll chase up these sites and direct them that way. I’m sure (read, hope) that the developers would happily migrate to doing things the correct way.
So what’s our solution? Well, I asked anyway, but no response so far. We will go ahead and load the data, but in my opinion do so more conscientiously than some of the other attempts. For one, we’re not loading and filtering complete HTML pages as they are, that’s what screen scraping is, attempting to extract data from a rendered output. We’ll be loading the data directly, which is publicly accessible and hopefully with less overhead being a straight-up JSON file.
We’re also not loading anything regularly or for a long duration of time. We’re only concerned with one activity, the site will only load the data once the race has begun and when it’s finished we’ll take a copy of the file and serve it from our own servers as a static file.
We also intend on intelligently caching the data on our servers whilst the marathon is in progress. This way we can limit the amount of requests to RunKeeper and take on some of the load ourselves. It’s easily done and will actually improve the performance of the application anyway.
Really, I would like RunKeeper to see what we’re doing and think it’s cool. They have an awesome product and platform, people want to play with it more.
Once the race starts, all the data we need will be continually pushed from Gemma’s iPhone to the RunKeeper site.
Beats Per Mile will handle page requests and retrieve this data from RunKeeper in the form of a JSON file, which the server will then cache. We can cache the data knowing that the route already run will not change, we’re only interested in the new stuff.
RunKeeper updates their page every 25 seconds, so our cache will be about this length too. When a request is made, if the cache is still fresh we’ll serve that data, if not we’ll retrieve more from RunKeeper.
From then on, the page polls for the latest dataset and we serve the new information, appending data to the client-side stored JSON. The application works our what data each individual client has and serves the difference.
Once the run is complete, a flag will be raised by RunKeeper in the JSON file and we’ll update our page to reflect that. From this point on the application no longer needs to contact RunKeeper, so will serve static JSON from our own side.
It shapes up a little like this, where points 2 and 8 are conditional to the cache being in date:
As well as reducing the amount of calls to RunKeeper, caching the data means that visitors get the latest JSON nice and quickly.
I built a prototype in time for the Silverstone Half Marathon which Gemma ran last month which all went surprisingly smoothly. Here’s an image of the application as it was then:
The data is visualised on a Google map. The GPS coordinates are used to draw the red Polyline, the activity statistics are calculated by the iPhone app and simply refreshed here with every update.
Eventually our map will also have some more informative overlays, mile markers for example, and when and where the application sent tweets or found pictures taken nearby.
Initially we were going to show pace, speed and elevation graphs (you can see a very early attempt if you click through to the image link) but we’ve run out of time for those. Maybe version two.