The final part of Beats Per Mile worth mentioning is the Twitter integration.

The application was designed to send automated tweets on Gemma’s behalf from her personal Twitter account giving updates of her progress to those following.

Mainly the tweets would report her current location for the purposes of spectators waiting ahead, others would be sent at intervals of a mile or so to report pace and ongoing form.

Similar to the mechanism for fetching Instagram images at predetermined locations on the course, the application contained a list of agreed points at which to tweet.

These were at landmarks and certain spectator spots too, but also at course milestones — the first mile complete, halfway complete, one mile to go etc, as well as the start and finish.

The application monitored the elapsed distance and updated accordingly, grabbing the latest time and statistics from the RunKeeper data, as well as geolocating the tweet with the latest set of GPS coordinates.

Since Twitter deprecated support of Basic Auth, applications much authenticate users with OAuth to gain write access.

This means rather than handing over your username and password to applications and trust (hope) that they’re friendly — as used to be the only way — OAuth allows applications to request access on your behalf without users ever parting with precious login credentials. You simple give, or deny, permission.

Using TwitterOAuth

Twitter offer a number of links to OAuth libraries for various languages to make the this job a lot easier. There are many Twitter specific OAuth libraries in particular, purposely tailored for the API.

I picked up PHP-based TwitterOAuth by Abraham Williams, which gets you up and running very rapidly.

The OAuth flow is documented at length, but essentially performs three major tasks:

  • Obtains access from Twitter to make requests on behalf of a user
  • Directs a user to Twitter in order to authenticate their account
  • Gains authorisation from the user to make requests on their behalf


This is achieved with a cycle of exchanging authenticating tokens between application and Twitter to verify permission. TwitterOAuth particularly creates a session object in your application and rebuilds itself with each token exchange to remain contained in a single class instance.

In a very abridged manner, something like the following:

// Build TwitterOAuth object with client credentials
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET);
// Get temporary credentials to allow application to make requests and set callback
$request_token = $connection->getRequestToken(OAUTH_CALLBACK);

This initial call verifies your application has access the Twitter API, basically that it’s registered and good to go. Twitter returns two tokens, if successful, that we store:

$_SESSION['oauth_token'] = $request_token['oauth_token'];
$_SESSION['oauth_token_secret'] = $request_token['oauth_token_secret'];

Then the application sends the user to Twitter to authorise it’s access, using a URL generated with the token received above:

$authoriseURL = $connection->getAuthorizeURL($request_token['oauth_token']);
header(‘Location: ‘ . $authoriseURL);

On successful authorisation Twitter will return the user to your callback URL (set above) with a verification token. TwitterOAuth now rebuilds for the first time with the OAuth tokens in our session and uses the new verification token to get an a new access token which will grant us user account access:

// Create TwitterOAuth object with application tokens
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $_SESSION['oauth_token'], $_SESSION['oauth_token_secret']);

// Request access tokens from Twitter on behalf of current user with verifier
$access_token = $connection->getAccessToken($_REQUEST['oauth_verifier']);

This final request gets us two OAuth tokens specific to the current user, allowing us to make requests on their behalf, with which TwitterOAuth rebuilds again:

// Rebuild TwitterOAuth object with user granted tokens
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $access_token['oauth_token'], $access_token['oauth_token_secret']);

// Test that everything is working
$connection->get(‘account/verify_credentials’);

The dance made a hell of a lot easier with a library such as this.

Usually applications store these tokens final two tokens, the user generated oauth_token and oauth_token_secret, which saves the need to authorise the user again.

Storing these details (in a session or database) means that a username and password need not be saved. The tokens are good until the user revokes access, no sensitive information is ever released to the application, all the user ever gives is their permission.

With access tokens stored, the connection to Twitter is a lot simpler — just create the TwitterOAuth object with those user-generated codes as in the very last step, without any of the redirecting to and from Twitter.com. Of course, those tokens could only have ever been obtained by carrying out the full process to begin with.

Beats Per Mile is a single-user case application, so Gemma only had to authenticate the application once and then we hard-coded them into the scripts.

Tweet Away

With access granted the application was free to send out updates, based on the run data we were collecting and posting it directly.

As mentioned, locations translated into distances and that’s when we tweeted.

At mile twenty, it looked back at the mile splits so far and reported which were the fastest:

Relying on the total reported distance alone was flawed. There was a slight hiccup when RunKeeper lost GPS coverage under Blackfriars tunnel and in an attempt to compensate found the nearest location to be the other side of the Thames.

This caused a problem by adding extra distance to her total, so some of the latter tweets (“one mile to go”, for example) posted a little prematurely.

It was more of a problem for Gemma when running, the app announcing in her ear that she’d run further than she had, disheartened to see mile markers on the course thought to have already passed.

This is also why the total distance on the site clocks up to 27.65 miles, the race was long enough as it is!

The final touch to was to drop them on the map, alongside the Instagram images.

Beats Per Mile uses the Instagram API to find pictures taken around the marathon course.

We decided we’d need to find a way to put pictures on the map quite early on, knowing Gemma couldn’t be the one to stop and take them. Rather than try to pin a camera to her vest or strap one to a hat, the simplest solution was to find photos taken by spectators.

The Instagram API is fairly new and the app itself is getting extremely popular. Being mobile-based, we hoped it would be popular among spectators on the day, taking quick snaps and hopefully uploading a good amount of photos to dig around in.

The pictures are geo-tagged as well as captioned, so we could perform location queries and text-based searches (ultimately, a combination).

Firstly we agreed on places around which we’d search for pictures — busy spectator spots and London’s landmarks.

The idea was to look for pictures at these places as Gemma passed them. So we translated them into distances, i.e. determine the elapsed distance that would have been run when reaching each of these places.

Tower Bridge, for example, is at 12.5 miles. Big Ben is at 25 miles and so on — for about 10-15 hotspots.

The application monitored the total distance covered and at each of these key numbers hit the Instagram API for the most recent pictures around the location.

Setting up an Instagram application is instantaneous, though I waited a long time for my API key initially — I did apply when the announcement was first made however, so the turnaround may be a lot faster now.

There’s no moderation or application approval process, when you’re up and running you can start performing queries immediately.

The API is RESTful over HTTPS with a number of endpoints to query images, comments, users, locations, tags and so on. The developer docs are fairly comprehensive.

We’re interested in the media endpoint. Note that the following URLs require an access token or client id, which you will be given, omitted here for brevity.

Get the current most popular photos:

https://api.instagram.com/v1/media/popular

Or to get information about a single image with a media id:

https://api.instagram.com/v1/media/72612696

The search method was our main tool. It takes up to five parameters, lat, lng, max_timestamp, min_timestamp and distance.

https://api.instagram.com/v1/media/search?lat=51.54242&lng=-0.059702&distance=1000

Note only latitude and longitude parameters are required and distance is in meters, default at 1km with a maximum of 5km.

So at each of our key distances, the app took the latest latitude and longitude positions and grabbed the latest photos.

In an attempt to only select images of the marathon, this was backed up by inspecting within the result set for images with a caption containing any one of a set of predefined keywords such as ‘marathon’ or ‘runners’. That way we could almost ensure that we wouldn’t pick up any images not concerned with the race — though would find false positives.

Once we had the JSON data it was simply visualised on the map, Instagram host the images for us.

One thing lacking in the data perhaps, is that the location information only offers the latitude and longitude coordinates, no place or address names unless otherwise nominated by the user. For each image then I ran the data through Google’s Geocoding service to get a street or area name, just for display purposes.

On the whole, it worked well. The API is as straightforward as any and realistically, the biggest worry for us is that most people seem to use Instagram to take pictures of food and little else, we thought we’d have pictures of everyone’s breakfast around various parts of London all day.

Doubtingly, I wrote a simple ‘refresh’ button to rerun any query in case anything untoward or particularly boring popped up when I logged in to check, but between the huge crowds and the caption matching, I only had used it twice and both very early on in the day.

Here’s a handful of the pictures:

One of the key features of Beats Per Mile was the ability to listen to a ‘stream’ of Gemma’s iPod playlist, enabling you to hear exactly what she was listening to whenever you logged on.

We didn’t actually have a stream broadcasting from her iPod of course, rather a stream playing from the site that was synchronised with her start time.

We planned on using the SoundCloud API to do this and it was one of the last thing left to build before race day.

Part of the playlist was curated by friends, donated tracks with sentimental value or just old favourites for a personal touch and to provide an extra kick of motivation.

Asking a lot of people for contributions meant that the playlist wasn’t finalised and mixed until very late on — Saturday evening.

I created the player using the SoundCloud Player Widget in preparation for the tracks hoping that the player would be ready before they were uploaded.

It’s a Javascript-enhanced Flash Widget which uses Actionscript’s ExternalInterface to expose method handlers and control playback via a in-built API.

Unfortunately this meant it wouldn’t play on the iPhone. SoundCloud do offer a HTML5-based Custom Player (which falls back to Flash), but we didn’t have time to fully investigate wrangling together a player from scratch.

SoundCloud Content Identification

This would be shortly be rendered irrelevant when we discovered that it is now nearly impossible to upload any copyrighted songs, or tracks containing any samples of copyrighted songs, to SoundCloud. We also discovered that they’re very, very clever in how they go about detecting them.

Here’s what they wrote in January:

Starting in the last few weeks we’ve turned on an automatic content identification system, similar to those used on other major media sharing sites. The system is used primarily for identifying audio that rightsholders have requested to be taken off SoundCloud. This is good news because it makes it easier for artists, labels and other content owners to control how the content they’ve created is available. And when you upload your own audio to SoundCloud, we can find out more quickly if somebody is uploading a copy to their own page without your permission.

SoundCloud have always has the right to remove audio deemed in violation of rights as stipulated in their terms of use. They also host plenty of mixes and DJ sets as many other similar sites do.

When we tried to upload ours though, nothing would work. None of our mixes were authorised and the refusals would come after spending considerable (precious) time attempting to upload them.

SoundCloud are essentially performing some kind of wave form analysis, comparing uploads to audio already in their databases to detect duplicates.

Alternatives

There are a few ropey ways to (possibly) slip the net, such as adding a layer of low-level noise to distort the wave form or apply an amount of time-stretching (which was happening anyway, as songs were mixed together).

Too much of either would ruin the music. We were running out of time and didn’t want to risk any hacked attempt being found later and removed, perhaps mid-marathon in a worst case scenario.

So I began to an attempt to recreate the widget, from scratch after all.

I looked at the Yahoo! Media Player, which is actually suspiciously similar to SoundCloud’s Widget API. The methods are almost exactly the same, but I had trouble handling multiple files — it really wasn’t anywhere near as easy to implement.

After browsing for alteratives I eventually found JPlayer, a very simple and easily customisable JQuery plug-in. This would also mean we’d be iPhone-compatible.

It also meant that the files would need to be hosted ourselves, on a normal server, rather than letting SoundCloud handle the load — another reason for our initial choice. We actually ended up serving 7.68GB of streamed audio, fortunately my host is very stable and it didn’t end up being a problem.

When visitors landed on the page the player would calculate how long the run had been in progress and therefore where you should be in the playlist. The idea was to give no controls, other than mute, the playback would always be synchronised.

Rather than upload all the songs individually, the playlist was divded into five 30-60ish minute tracks and would be easier to navigate.

Once the player determined what track you should be on and where within that the playhead should be, it begins to buffer. Annoyingly, you could have been waiting some time. If the tracks were on SoundCloud’s giant servers then the audio would be properly streamed, but not from mine.

The wait entirely depended on when you happened to arrive, for some it wasn’t a problem. Say you luckily arrived at the page needing only to jump in five minutes on the current track, you’d have a small waiting time. If the jump was twenty-five minutes, then start twiddling your thumbs.

There wasn’t a wait when the player switched between tracks. If a track was paused, the player would record how long for and resume at a later point — where the playhead would otherwise be if you hadn’t paused, not where you left off.

This also took into account track changes if you paused toward the end of a track or paused for more than the duration of an entire part.

Back to the Cloud

SoundCloud have an obligation to artists and labels and choose to be very strict in authorising uploads that aren’t your own. I’ve wondered how copyright works for these sites, Mixcloud for example host countless mixed songs and sets.

Mixcloud in fact is where you’ll now find the mixes saved indefinitely, without complaint, on Gemma’s page.

Have a listen!

Marathon Mix – Part One
Marathon Mix – Part Two
Marathon Mix – Part Three
Marathon Mix – Part Four
Marathon Mix – Part Five

The fundraising amount shown on Beats Per Mile uses the JustGiving API to show the up-to-date figure on Gemma’s donation page.

By way of one of their various SDKs, the JustGiving API is a doddle to use and is pretty well featured. Unfortunately the documentation isn’t particularly comprehensive.

You can expect to be able to fetch any information found on a typical page — event and charity details, lists of donations and comments, even the colour scheme — all of which can be fetched without any need for authentication.

We’re using the PHP SDK to show a very simple totaliser, the current percentage raised of the target amount. This can be achieved in just a few lines of code.

Firstly you need to register your application, which will get you an API key and access to the staging sandbox. This access is only for development, you can query live pages but the information you’ll obtain is out-of-date.

Applications go through a two day approval process but require nothing more than an decent application description, from that point on you’re free to access real pages and live data.

The PHP SDK is available from GitHub and comes with plenty of examples and cover a lot of use cases for data queries.

As I say, the API offers plenty more, such as automatic page creation, user account creation and integrating donations from third-party sites.

Our needs are far simpler, however. The code is exactly this:

include ‘./JustGivingClient.php’;

$client = new JustGivingClient(“https://api.justgiving.com/”, “API_KEY”, 1);
$page = $client->Page->Retrieve(“Gemma-Bardsley”);
$funds = $page->grandTotalRaisedExcludingGiftAid;
$target = round($page->fundraisingTarget);
$percent_raised = round($funds/$target * 100);

As simple as querying the page, grabbing the current total and the target total then determining the percentage.

We retrieve the page via it’s short name, which for us is “Gemma-Bardsley”, this will be set in the JustGiving page admin area.

Note that the target amount has to be explicitly set there too (you’ll get a similar totaliser on your fundraising page anyway if that’s set correctly) and this queries the live API staging over HTTPS, not the developer sandbox.

The third parameter sent to the client constructor specifies version one of the API.

Central to the Beats Per Mile application is the run data, the live information reporting Gemma’s current location, elapsed time, covered distance, average pace and speed.

Working out how to capture this information was our first challenge. Determining whether it was even possible was pivotal as to whether we could build the site.

Before deciding on the eventual solution of using RunKeeper, we looked for a few different routes we could take, of varying levels of complexity.

Fellow iheartplay developer Tim took up the challenge.

Firstly we considered building our own hardware, the basic idea being a GSM/GPS module for about £50 along with a SIM card with unlimited text messages. The module is then programmed to send it’s location to Twitter via SMS every minute or so.

There are a lot of posts around with extensive details on how to build this kind of thing yourself, here’s one doing exactly that. He’s also considering selling pre-built units, which would save us needing to get our soldering irons out.

With this option, whilst the geo-positioning data would be accurate, you can run pretty far in a minute so the route would be a little raw and wouldn’t accurately reflect the true path and wouldn’t look great on a map.

One step further would be to use something like Open GPS Tracker, which is similar but rather than using a combined GSM/GPS module just uses a GPS module plugged directly into a mobile phone.

This particular unit comes pre-built and with firmware, so perhaps might be limited in what we could capture, but otherwise ready to roll straight out of the box and only requires an old (pre-paid) handset. It’s also small in size, easy enough to strap onto a runner or put in a pocket. This comes in around £50ish for parts again, plus $35.91 for the software.

There are commercially manufactured units available too, trackers for mountain climbers, skiers, pilots, sailors and such that aren’t products of home brew electronics.

The SPOT Personal Tracker is one option, a satellite GPS tracker with built-in location services including sending your current location to Google Maps in near real-time, specifically for sharing with others at home. Very reliable and robust, tiny and light, the location services lend to exactly what we want to do but it’s very over-spec. It’s able to work beyond cellular coverage and above 15,000ft — not particularly necessary for us — and comes in at €149 for the unit and a service charge of €99 for a years use.

It does however come with a particularly impressive distress button, notifying the GEOS International Emergency Response Center in case of emergencies should one press it. Presumably this calls in an airlift or team of S.A.R. St. Bernards, which could come in handy when she hits the wall.

Satellite GPS is more immediately available though, in the form of an iPhone.

Initially we considered developing an app from scratch, but that’s not really anything any of us have done before. We could make an online application running on Safari with the Geolocation API, but this would mean having a page open for the entire run and hoping that it doesn’t time out or anything gets pressed on-screen as arms start swinging.

On the app store and found InstaMapper, a lo-fi and free app which sends location updates to the InstaMapper site which are available via a public API. It’s compatible on iPhone, Android and any Java-capable phone, it’s well supported and tested — a far more viable solution than anything we could knock up in the limited time we have. It also works with the phone locked, conserving battery life. The only cost is the data plan, which Gemma already has.

Then came the revelation of RunKeeper Elite, our eventual winner. At the time, Gemma had only just started using RunKeeper Pro — the Pro and Elite labels refer to your subscription status, Elite is a premium service for $4.99 — and we hadn’t really looked into what was available with the upgrade.

RunKeeper does all the things you’d expect from a standard GPS-enabled running app (or Garmin device) — track time, distance, pace, calories burned etc, over a geolocated route visualised on a map, stored with all your previous runs.

RunKeeper Elite

The Elite service offers a few training programs, alerts and reports and as of March last year added a ‘Live’ service that pushes your running data to their website as you run, rather than only uploading a report once it’s complete (as the Pro version does). It’s exactly the same data, but made publicly available in real time.

Unfortunately, RunKeeper doesn’t actually have an API yet. On the forums they promise to soon, but not before race day.

Googling around there are a few sites that have worked out ways to grab the data. For example, Firehole has developed an interface to capture and save activity, there’s a ton of info in the right-hand column of his page. There are plenty of ways to hack and scrape, but it’s a bit dirty and not particularly relevant to us, but shows the data retrieval is possible nonetheless.

These seem to concern the syndication of complete activities, publishing a list of recent runs as one would their latest tweets, or calculating the total number of miles run per month, say. But we’re only interested in the data for one run, a single activity.

Rather than regularly monitoring a profile for updates, we only have to load the data that populates a single RunKeeper page.

This is handy, because rather than scraping and traversing a profile to find what we’re looking for (from rendered HTML) we can load the single data file directly, a JSON file hosted on RunKeeper, then work with the data in exactly the same way as they do.

Now, this is all a bit naughty. According the the RunKeeper Terms of Service, we are strictly not allowed to load the data from their site in this way, even with the permission of the account holder. Any form of scraping at all, is prohibited.

This is a big problem. So what do we do?

Well Len Hardy of Firehole openly links to his API interface in the forum thread mentioned before, pointing out that you can scrape the page data, that he does so and that he’ll freely share the code. RunKeeper haven’t responded to his comment specifically or reminded him of their TOS, but have commented later on the same thread.

I think it’s highly unlikely that RunKeeper could be unaware of these goings on, these scrapers are easily found if you search for them and have been around for a while without being shut down.

I’m rather hoping that they actually don’t mind too much, right now. They must be aware of the demand, they have an API in development. Perhaps when that’s published they’ll chase up these sites and direct them that way. I’m sure (read, hope) that the developers would happily migrate to doing things the correct way.

So what’s our solution? Well, I asked anyway, but no response so far. We will go ahead and load the data, but in my opinion do so more conscientiously than some of the other attempts. For one, we’re not loading and filtering complete HTML pages as they are, that’s what screen scraping is, attempting to extract data from a rendered output. We’ll be loading the data directly, which is publicly accessible and hopefully with less overhead being a straight-up JSON file.

We’re also not loading anything regularly or for a long duration of time. We’re only concerned with one activity, the site will only load the data once the race has begun and when it’s finished we’ll take a copy of the file and serve it from our own servers as a static file.

We also intend on intelligently caching the data on our servers whilst the marathon is in progress. This way we can limit the amount of requests to RunKeeper and take on some of the load ourselves. It’s easily done and will actually improve the performance of the application anyway.

Really, I would like RunKeeper to see what we’re doing and think it’s cool. They have an awesome product and platform, people want to play with it more.

How it works

Once the race starts, all the data we need will be continually pushed from Gemma’s iPhone to the RunKeeper site.

Beats Per Mile will handle page requests and retrieve this data from RunKeeper in the form of a JSON file, which the server will then cache. We can cache the data knowing that the route already run will not change, we’re only interested in the new stuff.

RunKeeper updates their page every 25 seconds, so our cache will be about this length too. When a request is made, if the cache is still fresh we’ll serve that data, if not we’ll retrieve more from RunKeeper.

From then on, the page polls for the latest dataset and we serve the new information, appending data to the client-side stored JSON. The application works our what data each individual client has and serves the difference.

Once the run is complete, a flag will be raised by RunKeeper in the JSON file and we’ll update our page to reflect that. From this point on the application no longer needs to contact RunKeeper, so will serve static JSON from our own side.

It shapes up a little like this, where points 2 and 8 are conditional to the cache being in date:

As well as reducing the amount of calls to RunKeeper, caching the data means that visitors get the latest JSON nice and quickly.

I built a prototype in time for the Silverstone Half Marathon which Gemma ran last month which all went surprisingly smoothly. Here’s an image of the application as it was then:

The data is visualised on a Google map. The GPS coordinates are used to draw the red Polyline, the activity statistics are calculated by the iPhone app and simply refreshed here with every update.

Eventually our map will also have some more informative overlays, mile markers for example, and when and where the application sent tweets or found pictures taken nearby.

Initially we were going to show pace, speed and elevation graphs (you can see a very early attempt if you click through to the image link) but we’ve run out of time for those. Maybe version two.

Is there anybody alive out there?