In addition to my last post on Python development environments, here’s a note on using the PyDev plug-in for Eclipse with our virtualenv set-up.

Assuming everything went swimmingly — virutalenv running, Django installed with a project folder (let’s call it ‘myproject’) having done something like this:

$ mkdir dev
$ cd dev
$ virtualenv --no-site-packages env
$ source env/bin/activate
(env) $ pip install django
(env) $ django-admin.py startproject myproject

First install PyDev in Eclipse under “Install New Software” with the URL: http://pydev.org/updates

Once that’s done, configure the Python interpreter. In your Preferences, under PyDev and Interpreter – Python, hit the Auto Config option to find all your libraries and populate the Python Path.

Otherwise, manually add Python and locate your interpreter, mine is under /usr/bin/python2.7.

For now, this only includes your globally-installed system-wide libraries, not those you’ve installed within the virtualenv environment.

To create a new “PyDev Django project” however, you’ll need to have Django installed globally (or otherwise configured in the above Python Path settings) so PyDev can see it. Right now ours isn’t, so we have an extra step.

Instead, we’ll create a “New PyDev project” (non-Django), add our virtualenv location containing the libraries in our local site-packages directory, then convert that to a Django Project once PyDev is satisfied we have the goods.

This method means we don’t have to install Django globally just for the sake of using this IDE.

To do this, from the File menu and New PyDev Project, I un-tick ‘Create src folder and add it to the PYTHONPATH’, instead selecting ‘Don’t configure PYTHONPATH (to be done manually later on)’.

Right-click the project folder, go to Properties and PyDev – PYTHONPATH and add a Source Folder pointing to your virtualenv site-packages. In this instance:

dev/env/lib/python2.7/site-packages

Having found Django, PyDev now let’s us convert this to a Django project. Right-click again and under the PyDev menu select Set as Django Project.

Now everything can be performed within Eclipse, rather than by the command line.

For example, to run the server we’ll add a Custom Command. Under the Django menu select Custom Command and add the following:

runserver --noreload

You may be asked to select which manage.py to run from, choose the one within your project, i.e. myproject/manage.py.

Hit Run and test http://localhost:8000/ within your browser.

Note the --noreload option allows Eclipse to maintain control over the process, rather than spawning a new thread. This function usually allows the server to reload automatically when changes are made to your code, at your convenience.

I mentioned in my previous post that I borked my system meddling with Python. Having reset my workspace, I’ve now set up a solid system that makes handling projects and multiple development environments super simple.

The new set up easily handles multiple Python projects, without compatibility or version conflicts. The installation is equally straightforward.

Before switching to a desktop Linux, I used to sing the praises of VMware and developing with virtual machines when dealing with unique environments. By “unique”, I rather mean any odd project out of the ordinary LAMP set-up I usually work with, or something that requires a specific version of a piece of software.

Since then however, I’ve found no need. So long as you think before you leap.

Virtual boxes (as closed, single-piece software) are good and all, you can be as venturous as you wish without risk of damaging your native system. Plus, if you screw one of these you can restore a saved state in a few clicks. However, the VM safety net allows you to proceed without caution, perhaps recklessly, at the expense of fully comprehending the commands you’re executing and tasks you’re running.

In that sense, they’re great for beginners uncertain of how (or if) they should install software, e.g. Apache, PHP, Python etc — appliances and virtual stacks are helpful.

Otherwise they can convolute your workspace — and more often than not, won’t be configured exactly how you want or need them. Running software natively is simple and as intended, it also allows you to configure your entire environment without any assumptions made by distributors.

Virtualenv

Virtualenv is quite the revelation. It facilitates multiple isolated Python environments on a single system, dynamically handling your Python Path so packages are install within an enclosed local directory, rather than in amongst your top-level system packages.

This means you can create project-by-project virtual environments avoiding compatibility and version conflicts. When an environment is created (and activated) libraries are thereafter installed within discreet directories that aren’t shared with other virtualenv environments.

This means nothing is installed “system-wide”, so libraries don’t accrue over time, there’s no balancing of versions. It also means you can work with different version of Python simultaneously.

Python packages should be installed with a package manager. The latest of which is pip.

Prior to this, easy_install was the manager du jour (part of Setuptools, both now out-dated), but we’ll only be using that to install pip:

$ sudo easy_install pip

Pip is a direct replacement for easy_install, improving on a few things (a comparison can be found on the installer site). Packages that are available with easy_install should be pip-installable and the installation method is the same — the following installs virtualenv:

$ sudo pip install virtualenv

With virtualenv installed we can create an environment within your workspace, all it needs is the environment directory name, here ‘env’:

$ virtualenv env

There are a few options you have with this command. In the following example, the --no-site-packages flag means that the new environment will not inherit any system-wide global site packages. The --distribute flag will install Distribute rather than Setuptools:

$ virtualenv --no-site-packages --distribute env

Distribute is to setuptools as Pip is to easy_install. Distribute and pip are the new hotness, Setuptools and easy_install are old and busted — for now.

Anyway, activate your environment:

$ source env/bin/activate

You’ll see from your shell prompt that the environment is activated, with the name prepended.

Then we’ll install something with pip. Yolk is a tool for querying the packages currently installed on your system, so we’ll install that and grab a list:

(env) $ pip install yolk
(env) $ yolk -l

Then you’ll see everything the environment can see in the output (this will depend on your global site packages and how you created the environment, as above).

Note that you don’t need to sudo whilst in the activated environment.

As a test, we’ll deactivate the environment and run the same command, which gets the following error (unless you have yolk installed globally):

(env) $ deactivate
$ yolk -l
yolk: command not found

If installed within an environment, a package is only available whilst it is activated. This is the means to install whatever you wish, without worrying about cross-project conflicts.

More pip

Another good feature of pip is to generate a list of requirements for your working set of packages. The command is called freeze and generates a text file as follows:

(env) $ pip freeze > requirements.txt

This will create a list of all installed packages with specific versions for each library. This is in a custom syntax and looks something like this:

distribute==0.6.19
wsgiref==0.1.2
yolk==0.4.1

This list can then be distributed (e.g. to a team of developers) and used to install those packages on other systems, like so:

(env) $ pip install -r requirements.txt

Note, this isn’t couple with virtualenv, which actually has it’s own method of bootstrapping — see “Creating Your Own Bootstrap Scripts”.

Since deciding to work exclusively in a Linux environment at the beginning of the year, I’ve been more than pleasantly surprised not to have found myself needing to reset my system as a result of the frequent changes of set-up and numerous installations and removals of software that I’ve needed to perform in order to work on various projects.

The inevitable day, however, came a couple of weeks ago when I royally screwed my system messing around with Python (solution in another blog post. Update: here it is).

Once Ubuntu was reinstalled, I encountered a problem attempting to recreate my workspace having opted to encrypt my home directory during user setup.

Running the normal LAMP-server setup, Apache is unable to access files within the encrypted home.

I was tying to duplicate my previous configuration, using individual VirtualHosts locating directories within my user home, for example:

/home/marc/sites/dev/

I’m pretty sure my home directory was encrypted last time too, but this problem was new for me — perhaps something from an update in between?

The permissions problem occurs as only my user, marc, has access to the home and Apache’s user, www-data, does not. This results in a HTTP 403 Forbidden when attempting to serve files.

Having a look around, I found a convoluted method using symlinks and Apache’s UserDir then a far simpler solution, on AskUbuntu, as follows.

It’s unsafe to change your home ownership (to www-data, for example) but Apache needs execute permissions there. So selectively chmod the directory:

sudo chmod 751 /home

This grants execute access to others, who can only read files with correct knowledge of names and locations. It also removes your user’s read access to /home, so you’ll have to sudo for that.

Another precaution benefiting those on development-only machines, is to restrict IP listening within Apache’s ports.conf, so only local connections get any attention:

Listen 127.0.0.1:80

Alternatives

As for alternatives, you could encrypt your whole drive rather than just the home directory. You shouldn’t see any problems then.

Or you could just ignore encryption all together.

You could, of course, just work out of the traditional /var/www/ location, which is the Apache default. Simply create a directory there and chown to your user so you don’t have to always sudo changes.

sudo mkdir /var/www/dev/
sudo chown marc /var/www/dev/

If you’re directories are elsewhere on your system, for example in SVN repositories such as /srv/svn/ or /usr/local/svn/ then you’ll need to chown those to www-data so they’re readable, similar to our method of reading from within /home above.

The Ubuntu docs on Subversion offer the best solution for handling user permissions for SVN over HTTP.

Create a new user group, subversion, add the users marc and www-data to it and chown the repo to www-data:subversion, giving read/write access to the group (granting privileges to marc). Finally chmod with -s so that new files inherit that group ID, like so:

cd /srv/svn/
sudo chown -R www-data:subversion dev/
sudo chmod -R g+rws dev/

The -s flag means that all files created inside that directory will inherit the group of the directory, otherwise files takes on the primary group of the user. New subdirectories will also inherit this.

The -R option applies the changes recursively (i.e. existing subdirectories).

The final part of Beats Per Mile worth mentioning is the Twitter integration.

The application was designed to send automated tweets on Gemma’s behalf from her personal Twitter account giving updates of her progress to those following.

Mainly the tweets would report her current location for the purposes of spectators waiting ahead, others would be sent at intervals of a mile or so to report pace and ongoing form.

Similar to the mechanism for fetching Instagram images at predetermined locations on the course, the application contained a list of agreed points at which to tweet.

These were at landmarks and certain spectator spots too, but also at course milestones — the first mile complete, halfway complete, one mile to go etc, as well as the start and finish.

The application monitored the elapsed distance and updated accordingly, grabbing the latest time and statistics from the RunKeeper data, as well as geolocating the tweet with the latest set of GPS coordinates.

Since Twitter deprecated support of Basic Auth, applications much authenticate users with OAuth to gain write access.

This means rather than handing over your username and password to applications and trust (hope) that they’re friendly — as used to be the only way — OAuth allows applications to request access on your behalf without users ever parting with precious login credentials. You simple give, or deny, permission.

Using TwitterOAuth

Twitter offer a number of links to OAuth libraries for various languages to make the this job a lot easier. There are many Twitter specific OAuth libraries in particular, purposely tailored for the API.

I picked up PHP-based TwitterOAuth by Abraham Williams, which gets you up and running very rapidly.

The OAuth flow is documented at length, but essentially performs three major tasks:

  • Obtains access from Twitter to make requests on behalf of a user
  • Directs a user to Twitter in order to authenticate their account
  • Gains authorisation from the user to make requests on their behalf


This is achieved with a cycle of exchanging authenticating tokens between application and Twitter to verify permission. TwitterOAuth particularly creates a session object in your application and rebuilds itself with each token exchange to remain contained in a single class instance.

In a very abridged manner, something like the following:

// Build TwitterOAuth object with client credentials
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET);
// Get temporary credentials to allow application to make requests and set callback
$request_token = $connection->getRequestToken(OAUTH_CALLBACK);

This initial call verifies your application has access the Twitter API, basically that it’s registered and good to go. Twitter returns two tokens, if successful, that we store:

$_SESSION['oauth_token'] = $request_token['oauth_token'];
$_SESSION['oauth_token_secret'] = $request_token['oauth_token_secret'];

Then the application sends the user to Twitter to authorise it’s access, using a URL generated with the token received above:

$authoriseURL = $connection->getAuthorizeURL($request_token['oauth_token']);
header(‘Location: ‘ . $authoriseURL);

On successful authorisation Twitter will return the user to your callback URL (set above) with a verification token. TwitterOAuth now rebuilds for the first time with the OAuth tokens in our session and uses the new verification token to get an a new access token which will grant us user account access:

// Create TwitterOAuth object with application tokens
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $_SESSION['oauth_token'], $_SESSION['oauth_token_secret']);

// Request access tokens from Twitter on behalf of current user with verifier
$access_token = $connection->getAccessToken($_REQUEST['oauth_verifier']);

This final request gets us two OAuth tokens specific to the current user, allowing us to make requests on their behalf, with which TwitterOAuth rebuilds again:

// Rebuild TwitterOAuth object with user granted tokens
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $access_token['oauth_token'], $access_token['oauth_token_secret']);

// Test that everything is working
$connection->get(‘account/verify_credentials’);

The dance made a hell of a lot easier with a library such as this.

Usually applications store these tokens final two tokens, the user generated oauth_token and oauth_token_secret, which saves the need to authorise the user again.

Storing these details (in a session or database) means that a username and password need not be saved. The tokens are good until the user revokes access, no sensitive information is ever released to the application, all the user ever gives is their permission.

With access tokens stored, the connection to Twitter is a lot simpler — just create the TwitterOAuth object with those user-generated codes as in the very last step, without any of the redirecting to and from Twitter.com. Of course, those tokens could only have ever been obtained by carrying out the full process to begin with.

Beats Per Mile is a single-user case application, so Gemma only had to authenticate the application once and then we hard-coded them into the scripts.

Tweet Away

With access granted the application was free to send out updates, based on the run data we were collecting and posting it directly.

As mentioned, locations translated into distances and that’s when we tweeted.

At mile twenty, it looked back at the mile splits so far and reported which were the fastest:

Relying on the total reported distance alone was flawed. There was a slight hiccup when RunKeeper lost GPS coverage under Blackfriars tunnel and in an attempt to compensate found the nearest location to be the other side of the Thames.

This caused a problem by adding extra distance to her total, so some of the latter tweets (“one mile to go”, for example) posted a little prematurely.

It was more of a problem for Gemma when running, the app announcing in her ear that she’d run further than she had, disheartened to see mile markers on the course thought to have already passed.

This is also why the total distance on the site clocks up to 27.65 miles, the race was long enough as it is!

The final touch to was to drop them on the map, alongside the Instagram images.

Beats Per Mile uses the Instagram API to find pictures taken around the marathon course.

We decided we’d need to find a way to put pictures on the map quite early on, knowing Gemma couldn’t be the one to stop and take them. Rather than try to pin a camera to her vest or strap one to a hat, the simplest solution was to find photos taken by spectators.

The Instagram API is fairly new and the app itself is getting extremely popular. Being mobile-based, we hoped it would be popular among spectators on the day, taking quick snaps and hopefully uploading a good amount of photos to dig around in.

The pictures are geo-tagged as well as captioned, so we could perform location queries and text-based searches (ultimately, a combination).

Firstly we agreed on places around which we’d search for pictures — busy spectator spots and London’s landmarks.

The idea was to look for pictures at these places as Gemma passed them. So we translated them into distances, i.e. determine the elapsed distance that would have been run when reaching each of these places.

Tower Bridge, for example, is at 12.5 miles. Big Ben is at 25 miles and so on — for about 10-15 hotspots.

The application monitored the total distance covered and at each of these key numbers hit the Instagram API for the most recent pictures around the location.

Setting up an Instagram application is instantaneous, though I waited a long time for my API key initially — I did apply when the announcement was first made however, so the turnaround may be a lot faster now.

There’s no moderation or application approval process, when you’re up and running you can start performing queries immediately.

The API is RESTful over HTTPS with a number of endpoints to query images, comments, users, locations, tags and so on. The developer docs are fairly comprehensive.

We’re interested in the media endpoint. Note that the following URLs require an access token or client id, which you will be given, omitted here for brevity.

Get the current most popular photos:

https://api.instagram.com/v1/media/popular

Or to get information about a single image with a media id:

https://api.instagram.com/v1/media/72612696

The search method was our main tool. It takes up to five parameters, lat, lng, max_timestamp, min_timestamp and distance.

https://api.instagram.com/v1/media/search?lat=51.54242&lng=-0.059702&distance=1000

Note only latitude and longitude parameters are required and distance is in meters, default at 1km with a maximum of 5km.

So at each of our key distances, the app took the latest latitude and longitude positions and grabbed the latest photos.

In an attempt to only select images of the marathon, this was backed up by inspecting within the result set for images with a caption containing any one of a set of predefined keywords such as ‘marathon’ or ‘runners’. That way we could almost ensure that we wouldn’t pick up any images not concerned with the race — though would find false positives.

Once we had the JSON data it was simply visualised on the map, Instagram host the images for us.

One thing lacking in the data perhaps, is that the location information only offers the latitude and longitude coordinates, no place or address names unless otherwise nominated by the user. For each image then I ran the data through Google’s Geocoding service to get a street or area name, just for display purposes.

On the whole, it worked well. The API is as straightforward as any and realistically, the biggest worry for us is that most people seem to use Instagram to take pictures of food and little else, we thought we’d have pictures of everyone’s breakfast around various parts of London all day.

Doubtingly, I wrote a simple ‘refresh’ button to rerun any query in case anything untoward or particularly boring popped up when I logged in to check, but between the huge crowds and the caption matching, I only had used it twice and both very early on in the day.

Here’s a handful of the pictures:

Show a little faith, there’s magic in the night.