Central to the Beats Per Mile application is the run data, the live information reporting Gemma’s current location, elapsed time, covered distance, average pace and speed.

Working out how to capture this information was our first challenge. Determining whether it was even possible was pivotal as to whether we could build the site.

Before deciding on the eventual solution of using RunKeeper, we looked for a few different routes we could take, of varying levels of complexity.

Fellow iheartplay developer Tim took up the challenge.

Firstly we considered building our own hardware, the basic idea being a GSM/GPS module for about £50 along with a SIM card with unlimited text messages. The module is then programmed to send it’s location to Twitter via SMS every minute or so.

There are a lot of posts around with extensive details on how to build this kind of thing yourself, here’s one doing exactly that. He’s also considering selling pre-built units, which would save us needing to get our soldering irons out.

With this option, whilst the geo-positioning data would be accurate, you can run pretty far in a minute so the route would be a little raw and wouldn’t accurately reflect the true path and wouldn’t look great on a map.

One step further would be to use something like Open GPS Tracker, which is similar but rather than using a combined GSM/GPS module just uses a GPS module plugged directly into a mobile phone.

This particular unit comes pre-built and with firmware, so perhaps might be limited in what we could capture, but otherwise ready to roll straight out of the box and only requires an old (pre-paid) handset. It’s also small in size, easy enough to strap onto a runner or put in a pocket. This comes in around £50ish for parts again, plus $35.91 for the software.

There are commercially manufactured units available too, trackers for mountain climbers, skiers, pilots, sailors and such that aren’t products of home brew electronics.

The SPOT Personal Tracker is one option, a satellite GPS tracker with built-in location services including sending your current location to Google Maps in near real-time, specifically for sharing with others at home. Very reliable and robust, tiny and light, the location services lend to exactly what we want to do but it’s very over-spec. It’s able to work beyond cellular coverage and above 15,000ft — not particularly necessary for us — and comes in at €149 for the unit and a service charge of €99 for a years use.

It does however come with a particularly impressive distress button, notifying the GEOS International Emergency Response Center in case of emergencies should one press it. Presumably this calls in an airlift or team of S.A.R. St. Bernards, which could come in handy when she hits the wall.

Satellite GPS is more immediately available though, in the form of an iPhone.

Initially we considered developing an app from scratch, but that’s not really anything any of us have done before. We could make an online application running on Safari with the Geolocation API, but this would mean having a page open for the entire run and hoping that it doesn’t time out or anything gets pressed on-screen as arms start swinging.

On the app store and found InstaMapper, a lo-fi and free app which sends location updates to the InstaMapper site which are available via a public API. It’s compatible on iPhone, Android and any Java-capable phone, it’s well supported and tested — a far more viable solution than anything we could knock up in the limited time we have. It also works with the phone locked, conserving battery life. The only cost is the data plan, which Gemma already has.

Then came the revelation of RunKeeper Elite, our eventual winner. At the time, Gemma had only just started using RunKeeper Pro — the Pro and Elite labels refer to your subscription status, Elite is a premium service for $4.99 — and we hadn’t really looked into what was available with the upgrade.

RunKeeper does all the things you’d expect from a standard GPS-enabled running app (or Garmin device) — track time, distance, pace, calories burned etc, over a geolocated route visualised on a map, stored with all your previous runs.

RunKeeper Elite

The Elite service offers a few training programs, alerts and reports and as of March last year added a ‘Live’ service that pushes your running data to their website as you run, rather than only uploading a report once it’s complete (as the Pro version does). It’s exactly the same data, but made publicly available in real time.

Unfortunately, RunKeeper doesn’t actually have an API yet. On the forums they promise to soon, but not before race day.

Googling around there are a few sites that have worked out ways to grab the data. For example, Firehole has developed an interface to capture and save activity, there’s a ton of info in the right-hand column of his page. There are plenty of ways to hack and scrape, but it’s a bit dirty and not particularly relevant to us, but shows the data retrieval is possible nonetheless.

These seem to concern the syndication of complete activities, publishing a list of recent runs as one would their latest tweets, or calculating the total number of miles run per month, say. But we’re only interested in the data for one run, a single activity.

Rather than regularly monitoring a profile for updates, we only have to load the data that populates a single RunKeeper page.

This is handy, because rather than scraping and traversing a profile to find what we’re looking for (from rendered HTML) we can load the single data file directly, a JSON file hosted on RunKeeper, then work with the data in exactly the same way as they do.

Now, this is all a bit naughty. According the the RunKeeper Terms of Service, we are strictly not allowed to load the data from their site in this way, even with the permission of the account holder. Any form of scraping at all, is prohibited.

This is a big problem. So what do we do?

Well Len Hardy of Firehole openly links to his API interface in the forum thread mentioned before, pointing out that you can scrape the page data, that he does so and that he’ll freely share the code. RunKeeper haven’t responded to his comment specifically or reminded him of their TOS, but have commented later on the same thread.

I think it’s highly unlikely that RunKeeper could be unaware of these goings on, these scrapers are easily found if you search for them and have been around for a while without being shut down.

I’m rather hoping that they actually don’t mind too much, right now. They must be aware of the demand, they have an API in development. Perhaps when that’s published they’ll chase up these sites and direct them that way. I’m sure (read, hope) that the developers would happily migrate to doing things the correct way.

So what’s our solution? Well, I asked anyway, but no response so far. We will go ahead and load the data, but in my opinion do so more conscientiously than some of the other attempts. For one, we’re not loading and filtering complete HTML pages as they are, that’s what screen scraping is, attempting to extract data from a rendered output. We’ll be loading the data directly, which is publicly accessible and hopefully with less overhead being a straight-up JSON file.

We’re also not loading anything regularly or for a long duration of time. We’re only concerned with one activity, the site will only load the data once the race has begun and when it’s finished we’ll take a copy of the file and serve it from our own servers as a static file.

We also intend on intelligently caching the data on our servers whilst the marathon is in progress. This way we can limit the amount of requests to RunKeeper and take on some of the load ourselves. It’s easily done and will actually improve the performance of the application anyway.

Really, I would like RunKeeper to see what we’re doing and think it’s cool. They have an awesome product and platform, people want to play with it more.

How it works

Once the race starts, all the data we need will be continually pushed from Gemma’s iPhone to the RunKeeper site.

Beats Per Mile will handle page requests and retrieve this data from RunKeeper in the form of a JSON file, which the server will then cache. We can cache the data knowing that the route already run will not change, we’re only interested in the new stuff.

RunKeeper updates their page every 25 seconds, so our cache will be about this length too. When a request is made, if the cache is still fresh we’ll serve that data, if not we’ll retrieve more from RunKeeper.

From then on, the page polls for the latest dataset and we serve the new information, appending data to the client-side stored JSON. The application works our what data each individual client has and serves the difference.

Once the run is complete, a flag will be raised by RunKeeper in the JSON file and we’ll update our page to reflect that. From this point on the application no longer needs to contact RunKeeper, so will serve static JSON from our own side.

It shapes up a little like this, where points 2 and 8 are conditional to the cache being in date:

As well as reducing the amount of calls to RunKeeper, caching the data means that visitors get the latest JSON nice and quickly.

I built a prototype in time for the Silverstone Half Marathon which Gemma ran last month which all went surprisingly smoothly. Here’s an image of the application as it was then:

The data is visualised on a Google map. The GPS coordinates are used to draw the red Polyline, the activity statistics are calculated by the iPhone app and simply refreshed here with every update.

Eventually our map will also have some more informative overlays, mile markers for example, and when and where the application sent tweets or found pictures taken nearby.

Initially we were going to show pace, speed and elevation graphs (you can see a very early attempt if you click through to the image link) but we’ve run out of time for those. Maybe version two.

3 Comments

  1. Nice article. To be clear, my API uses screen scraping, as you indicated, only to get the activity list, but then retrieves the JSON file which contains all of the run data, gps data, elevation data, and speed data. None of this happens in real time when my website is loaded. It is configured to run every hour, it pulls in new run information and updates a My/Sql database. All of the presentation in my sidebar, graphs, and maps are then driven off of the database. So, it is pretty lightweight, one hit per hour to the Runkeeper screen which lists the activities, and then JSON files for all new runs.

    And, yes, as soon as an “official” API is available, I’d be happy to use it.

    Len

  2. Hi Len, thanks for that.

    I can’t see how either of our methods could cost much more than that which the “official” API will serve. It’ll make our jobs easier when it does turn up.

    Have you had any correspondence with RunKeeper about your API? I’ve sent them a couple more messages including a link to this post but nothing yet.

    I’m hoping they secretly like that developers are interfacing with their platform already.

  3. Marc – The only contact that I’ve had with Runkeeper is through the forum that you mentioned. I do have a friend that lives in Rome, I’m collaborating with him to build a website, runrome.com. His idea is to run every street in Rome and track it via runrome.com. He asked Runkeeper to work with him on the Runkeeper integration, several weeks went by with no response, and then they indicated that they were focused on their product and didn’t have the time. So, we’re integrating my API with Google Maps etc. to build the site. It should launch soon. When it does I’ll let you know, it’s an interesting concept.

    Len


One Trackback/Pingback

  1. By » As Long as You Follow by Marc Hibbins on 22 May 2011 at 1:09 pm

    [...] application monitored the elapsed distance and updated accordingly, grabbing the latest time and statistics from the RunKeeper data, as well as geolocating the tweet with the latest set of GPS [...]

Is there anybody alive out there?