Monthly Archives: April 2011

One of the key features of Beats Per Mile was the ability to listen to a ‘stream’ of Gemma’s iPod playlist, enabling you to hear exactly what she was listening to whenever you logged on.

We didn’t actually have a stream broadcasting from her iPod of course, rather a stream playing from the site that was synchronised with her start time.

We planned on using the SoundCloud API to do this and it was one of the last thing left to build before race day.

Part of the playlist was curated by friends, donated tracks with sentimental value or just old favourites for a personal touch and to provide an extra kick of motivation.

Asking a lot of people for contributions meant that the playlist wasn’t finalised and mixed until very late on — Saturday evening.

I created the player using the SoundCloud Player Widget in preparation for the tracks hoping that the player would be ready before they were uploaded.

It’s a Javascript-enhanced Flash Widget which uses Actionscript’s ExternalInterface to expose method handlers and control playback via a in-built API.

Unfortunately this meant it wouldn’t play on the iPhone. SoundCloud do offer a HTML5-based Custom Player (which falls back to Flash), but we didn’t have time to fully investigate wrangling together a player from scratch.

SoundCloud Content Identification

This would be shortly be rendered irrelevant when we discovered that it is now nearly impossible to upload any copyrighted songs, or tracks containing any samples of copyrighted songs, to SoundCloud. We also discovered that they’re very, very clever in how they go about detecting them.

Here’s what they wrote in January:

Starting in the last few weeks we’ve turned on an automatic content identification system, similar to those used on other major media sharing sites. The system is used primarily for identifying audio that rightsholders have requested to be taken off SoundCloud. This is good news because it makes it easier for artists, labels and other content owners to control how the content they’ve created is available. And when you upload your own audio to SoundCloud, we can find out more quickly if somebody is uploading a copy to their own page without your permission.

SoundCloud have always has the right to remove audio deemed in violation of rights as stipulated in their terms of use. They also host plenty of mixes and DJ sets as many other similar sites do.

When we tried to upload ours though, nothing would work. None of our mixes were authorised and the refusals would come after spending considerable (precious) time attempting to upload them.

SoundCloud are essentially performing some kind of wave form analysis, comparing uploads to audio already in their databases to detect duplicates.

Alternatives

There are a few ropey ways to (possibly) slip the net, such as adding a layer of low-level noise to distort the wave form or apply an amount of time-stretching (which was happening anyway, as songs were mixed together).

Too much of either would ruin the music. We were running out of time and didn’t want to risk any hacked attempt being found later and removed, perhaps mid-marathon in a worst case scenario.

So I began to an attempt to recreate the widget, from scratch after all.

I looked at the Yahoo! Media Player, which is actually suspiciously similar to SoundCloud’s Widget API. The methods are almost exactly the same, but I had trouble handling multiple files — it really wasn’t anywhere near as easy to implement.

After browsing for alteratives I eventually found JPlayer, a very simple and easily customisable JQuery plug-in. This would also mean we’d be iPhone-compatible.

It also meant that the files would need to be hosted ourselves, on a normal server, rather than letting SoundCloud handle the load — another reason for our initial choice. We actually ended up serving 7.68GB of streamed audio, fortunately my host is very stable and it didn’t end up being a problem.

When visitors landed on the page the player would calculate how long the run had been in progress and therefore where you should be in the playlist. The idea was to give no controls, other than mute, the playback would always be synchronised.

Rather than upload all the songs individually, the playlist was divded into five 30-60ish minute tracks and would be easier to navigate.

Once the player determined what track you should be on and where within that the playhead should be, it begins to buffer. Annoyingly, you could have been waiting some time. If the tracks were on SoundCloud’s giant servers then the audio would be properly streamed, but not from mine.

The wait entirely depended on when you happened to arrive, for some it wasn’t a problem. Say you luckily arrived at the page needing only to jump in five minutes on the current track, you’d have a small waiting time. If the jump was twenty-five minutes, then start twiddling your thumbs.

There wasn’t a wait when the player switched between tracks. If a track was paused, the player would record how long for and resume at a later point — where the playhead would otherwise be if you hadn’t paused, not where you left off.

This also took into account track changes if you paused toward the end of a track or paused for more than the duration of an entire part.

Back to the Cloud

SoundCloud have an obligation to artists and labels and choose to be very strict in authorising uploads that aren’t your own. I’ve wondered how copyright works for these sites, Mixcloud for example host countless mixed songs and sets.

Mixcloud in fact is where you’ll now find the mixes saved indefinitely, without complaint, on Gemma’s page.

Have a listen!

Marathon Mix – Part One
Marathon Mix – Part Two
Marathon Mix – Part Three
Marathon Mix – Part Four
Marathon Mix – Part Five

The fundraising amount shown on Beats Per Mile uses the JustGiving API to show the up-to-date figure on Gemma’s donation page.

By way of one of their various SDKs, the JustGiving API is a doddle to use and is pretty well featured. Unfortunately the documentation isn’t particularly comprehensive.

You can expect to be able to fetch any information found on a typical page — event and charity details, lists of donations and comments, even the colour scheme — all of which can be fetched without any need for authentication.

We’re using the PHP SDK to show a very simple totaliser, the current percentage raised of the target amount. This can be achieved in just a few lines of code.

Firstly you need to register your application, which will get you an API key and access to the staging sandbox. This access is only for development, you can query live pages but the information you’ll obtain is out-of-date.

Applications go through a two day approval process but require nothing more than an decent application description, from that point on you’re free to access real pages and live data.

The PHP SDK is available from GitHub and comes with plenty of examples and cover a lot of use cases for data queries.

As I say, the API offers plenty more, such as automatic page creation, user account creation and integrating donations from third-party sites.

Our needs are far simpler, however. The code is exactly this:

include ‘./JustGivingClient.php’;

$client = new JustGivingClient(“https://api.justgiving.com/”, “API_KEY”, 1);
$page = $client->Page->Retrieve(“Gemma-Bardsley”);
$funds = $page->grandTotalRaisedExcludingGiftAid;
$target = round($page->fundraisingTarget);
$percent_raised = round($funds/$target * 100);

As simple as querying the page, grabbing the current total and the target total then determining the percentage.

We retrieve the page via it’s short name, which for us is “Gemma-Bardsley”, this will be set in the JustGiving page admin area.

Note that the target amount has to be explicitly set there too (you’ll get a similar totaliser on your fundraising page anyway if that’s set correctly) and this queries the live API staging over HTTPS, not the developer sandbox.

The third parameter sent to the client constructor specifies version one of the API.

Central to the Beats Per Mile application is the run data, the live information reporting Gemma’s current location, elapsed time, covered distance, average pace and speed.

Working out how to capture this information was our first challenge. Determining whether it was even possible was pivotal as to whether we could build the site.

Before deciding on the eventual solution of using RunKeeper, we looked for a few different routes we could take, of varying levels of complexity.

Fellow iheartplay developer Tim took up the challenge.

Firstly we considered building our own hardware, the basic idea being a GSM/GPS module for about £50 along with a SIM card with unlimited text messages. The module is then programmed to send it’s location to Twitter via SMS every minute or so.

There are a lot of posts around with extensive details on how to build this kind of thing yourself, here’s one doing exactly that. He’s also considering selling pre-built units, which would save us needing to get our soldering irons out.

With this option, whilst the geo-positioning data would be accurate, you can run pretty far in a minute so the route would be a little raw and wouldn’t accurately reflect the true path and wouldn’t look great on a map.

One step further would be to use something like Open GPS Tracker, which is similar but rather than using a combined GSM/GPS module just uses a GPS module plugged directly into a mobile phone.

This particular unit comes pre-built and with firmware, so perhaps might be limited in what we could capture, but otherwise ready to roll straight out of the box and only requires an old (pre-paid) handset. It’s also small in size, easy enough to strap onto a runner or put in a pocket. This comes in around £50ish for parts again, plus $35.91 for the software.

There are commercially manufactured units available too, trackers for mountain climbers, skiers, pilots, sailors and such that aren’t products of home brew electronics.

The SPOT Personal Tracker is one option, a satellite GPS tracker with built-in location services including sending your current location to Google Maps in near real-time, specifically for sharing with others at home. Very reliable and robust, tiny and light, the location services lend to exactly what we want to do but it’s very over-spec. It’s able to work beyond cellular coverage and above 15,000ft — not particularly necessary for us — and comes in at €149 for the unit and a service charge of €99 for a years use.

It does however come with a particularly impressive distress button, notifying the GEOS International Emergency Response Center in case of emergencies should one press it. Presumably this calls in an airlift or team of S.A.R. St. Bernards, which could come in handy when she hits the wall.

Satellite GPS is more immediately available though, in the form of an iPhone.

Initially we considered developing an app from scratch, but that’s not really anything any of us have done before. We could make an online application running on Safari with the Geolocation API, but this would mean having a page open for the entire run and hoping that it doesn’t time out or anything gets pressed on-screen as arms start swinging.

On the app store and found InstaMapper, a lo-fi and free app which sends location updates to the InstaMapper site which are available via a public API. It’s compatible on iPhone, Android and any Java-capable phone, it’s well supported and tested — a far more viable solution than anything we could knock up in the limited time we have. It also works with the phone locked, conserving battery life. The only cost is the data plan, which Gemma already has.

Then came the revelation of RunKeeper Elite, our eventual winner. At the time, Gemma had only just started using RunKeeper Pro — the Pro and Elite labels refer to your subscription status, Elite is a premium service for $4.99 — and we hadn’t really looked into what was available with the upgrade.

RunKeeper does all the things you’d expect from a standard GPS-enabled running app (or Garmin device) — track time, distance, pace, calories burned etc, over a geolocated route visualised on a map, stored with all your previous runs.

RunKeeper Elite

The Elite service offers a few training programs, alerts and reports and as of March last year added a ‘Live’ service that pushes your running data to their website as you run, rather than only uploading a report once it’s complete (as the Pro version does). It’s exactly the same data, but made publicly available in real time.

Unfortunately, RunKeeper doesn’t actually have an API yet. On the forums they promise to soon, but not before race day.

Googling around there are a few sites that have worked out ways to grab the data. For example, Firehole has developed an interface to capture and save activity, there’s a ton of info in the right-hand column of his page. There are plenty of ways to hack and scrape, but it’s a bit dirty and not particularly relevant to us, but shows the data retrieval is possible nonetheless.

These seem to concern the syndication of complete activities, publishing a list of recent runs as one would their latest tweets, or calculating the total number of miles run per month, say. But we’re only interested in the data for one run, a single activity.

Rather than regularly monitoring a profile for updates, we only have to load the data that populates a single RunKeeper page.

This is handy, because rather than scraping and traversing a profile to find what we’re looking for (from rendered HTML) we can load the single data file directly, a JSON file hosted on RunKeeper, then work with the data in exactly the same way as they do.

Now, this is all a bit naughty. According the the RunKeeper Terms of Service, we are strictly not allowed to load the data from their site in this way, even with the permission of the account holder. Any form of scraping at all, is prohibited.

This is a big problem. So what do we do?

Well Len Hardy of Firehole openly links to his API interface in the forum thread mentioned before, pointing out that you can scrape the page data, that he does so and that he’ll freely share the code. RunKeeper haven’t responded to his comment specifically or reminded him of their TOS, but have commented later on the same thread.

I think it’s highly unlikely that RunKeeper could be unaware of these goings on, these scrapers are easily found if you search for them and have been around for a while without being shut down.

I’m rather hoping that they actually don’t mind too much, right now. They must be aware of the demand, they have an API in development. Perhaps when that’s published they’ll chase up these sites and direct them that way. I’m sure (read, hope) that the developers would happily migrate to doing things the correct way.

So what’s our solution? Well, I asked anyway, but no response so far. We will go ahead and load the data, but in my opinion do so more conscientiously than some of the other attempts. For one, we’re not loading and filtering complete HTML pages as they are, that’s what screen scraping is, attempting to extract data from a rendered output. We’ll be loading the data directly, which is publicly accessible and hopefully with less overhead being a straight-up JSON file.

We’re also not loading anything regularly or for a long duration of time. We’re only concerned with one activity, the site will only load the data once the race has begun and when it’s finished we’ll take a copy of the file and serve it from our own servers as a static file.

We also intend on intelligently caching the data on our servers whilst the marathon is in progress. This way we can limit the amount of requests to RunKeeper and take on some of the load ourselves. It’s easily done and will actually improve the performance of the application anyway.

Really, I would like RunKeeper to see what we’re doing and think it’s cool. They have an awesome product and platform, people want to play with it more.

How it works

Once the race starts, all the data we need will be continually pushed from Gemma’s iPhone to the RunKeeper site.

Beats Per Mile will handle page requests and retrieve this data from RunKeeper in the form of a JSON file, which the server will then cache. We can cache the data knowing that the route already run will not change, we’re only interested in the new stuff.

RunKeeper updates their page every 25 seconds, so our cache will be about this length too. When a request is made, if the cache is still fresh we’ll serve that data, if not we’ll retrieve more from RunKeeper.

From then on, the page polls for the latest dataset and we serve the new information, appending data to the client-side stored JSON. The application works our what data each individual client has and serves the difference.

Once the run is complete, a flag will be raised by RunKeeper in the JSON file and we’ll update our page to reflect that. From this point on the application no longer needs to contact RunKeeper, so will serve static JSON from our own side.

It shapes up a little like this, where points 2 and 8 are conditional to the cache being in date:

As well as reducing the amount of calls to RunKeeper, caching the data means that visitors get the latest JSON nice and quickly.

I built a prototype in time for the Silverstone Half Marathon which Gemma ran last month which all went surprisingly smoothly. Here’s an image of the application as it was then:

The data is visualised on a Google map. The GPS coordinates are used to draw the red Polyline, the activity statistics are calculated by the iPhone app and simply refreshed here with every update.

Eventually our map will also have some more informative overlays, mile markers for example, and when and where the application sent tweets or found pictures taken nearby.

Initially we were going to show pace, speed and elevation graphs (you can see a very early attempt if you click through to the image link) but we’ve run out of time for those. Maybe version two.

In my last post I previewed a forthcoming project where we track a marathon as it’s run in real-time, with a whole heap of maps, graphs, tweets and APIs flying about the place.

It has now been named Beats Per Mile.

We’ve put up a holding page on Gemma’s site (she’s the one having to run the thing) and she’s written up a preview on her blog, too.

I know a pretty little place in Southern California down San Diego way.