Seems I was a little late in finding out about the BBC’s work on integrating and exposing semantic data in their (then) new beta trial of Artist pages a little while ago.

In an interview with Silicon.com, Matthew Shorter, BBC’s interactive editor for music, speaks about establishing data associations with MusicBrainz, an open user-contributed ‘metadatabase’, to roll out across all of their encyclopaedic artist pages on the BBC site.

MusicBrainz has been around for some time now, it’s a huge database of music metadata storing information such as artists, their releases, song details, biographies. Right now it has information on over 400,000 artists.

As early as 2001, it was described as a ‘Semantic Web service‘ (think a Semantic Web web service), in its offering of a massive store of machine-processable, openly available information (mostly public domain or Creative Commons-licensed), available via open protocols – in RDF format no less.

The BBC have adopted this open standard, mapping their data schema with that published by MusicBrainz to utilise the unique identifiers they provide. This allows the BBC site to leverage the public domain content, augmenting the profile pages found there.

Take a look at one of the records from MusicBrainz, for example, John Lennon’s information at http://musicbrainz.org/artist/4d5447d7-c61c-4120-ba1b-d7f471d385b9.html.

The unique ID here is the MBID, ’4d5447d7-c61c-4120-ba1b-d7f471d385b9‘.

The BBC then, have a dynamically generated page at http://www.bbc.co.uk/music/artists/4d5447d7-c61c-4120-ba1b-d7f471d385b9.

Previously, writers at the BBC would have to write (and keep up to date) interesting and relevant content on every single artist pages they publish – which I’m sure you can imagine is as unenviable as impossible. Now, MusicBrainz populates a lot of the information here – see the Releases and Credits – and also provides the retrieval of the biography from Wikipedia.

At the same time, the BBC radio playout system (reportedly giant iPods in the basement of Broadcasting House) update the playlist information on the right of the page.

As Matthew Shorter says, automation and dynamic publishing means the pages can be created and maintained with a fraction of the manpower. Check the Foals page for a more recent artist and you’ll see news articles automatically aggregated also.

Gathering resources in this way and adding context around the artists enables machines to process the links between these data sets, establish relationships between the information and perform interoperation based on those.

In his article above, Tom Scott (the Technical Project Team Leader) also describes these URIs as ‘web scale identifiers’ and talks about the principles of Linked Data. Whilst in this use case these locators facilitate simple data retrieval, the notion of the absolute, global URI is a far larger idea, and here, could grow to be far more powerful.

The URIs facilitate the mechanisms, but stand to play a far larger role in opening and standardising information on the Web as a whole. The MusicBrainz MBID attempts to standardise the way we reference information online regarding music, it’s wide reuse, is in a sense, achieving that goal. But rather than thinking of these alphanumeric strings as pointing to locations of database records, they too can refer to the real world concepts they identify.

Imagine all online materials that feature a particular artist universally employing their single MBID string. Every semantically linked and annotated document and resource could be unified by an intelligent agent instructed to do so, collecting and amounting the information to describe that real world concept in it’s entirety. With consideration to the Semantic Web, ultimately, for a machine agent to understand that concept in it’s entirety.

In linking to MusicBrainz, the BBC then have equally made their data more portable to third parties wanting to use their data elsewhere. By agreeing on these unique IDs to identify resources, these pages can be automatically linked to and accessed based of this consistency.

The site provides a RESTful API, just add .xml.rdf, .json or .yaml to the end of the artist url.

The value of online information isn’t determined by scarcity like physical good are in the physical world. Reuse, repopulation and increasing visibility means, for the BBC, an enriched repository for the purposes of making information more accessible and useful to the reader (surely the inital goal), but also in having the link now established to MusicBrainz, the information is connected out into the Web, therefore enriching the source (and then exponentially any other links thereon). Better for the BBC, better for the third party, better for the reader - everything is enriched – so hopefully any later applications can benefit from this network effect.

Anyway, it turns out this has been going on since July last year, so perhaps the Silicon.com article was an attempt to increase visibility - we’re six months down the line now, after all.

If so, it worked – Sarah Perez wrote up an article at ReadWriteWeb and reports over at MusicBrainz suggest things are hotting up for this year. But if not, they should be applauded for commendable transparency and their open-minded efforts (and accept the extra drive of users to the service that comes with it!). It’s frustrating when products that are intended to ‘open up the web’ are kept closed and private for commercial purposes.

Thing is, I’m surprised I hadn’t found out about this before now. Shorter also describes this as being part of a general movement that’s going on at the BBC, “to move away from pages that are built in a variety of legacy content production systems to actually publishing data that we can use in a more dynamic way across the web.” So I went digging for more – thinking that, if this (pretty awesome) beta went online relatively quietly and the BBC aren’t particularly shouting about these new innovation (which I think they should!), perhaps there’s more elsewhere?

Well, I found two presentations over at Slideshare, the first on “BBC Programmes and Music on the Linking Open Data Cloud“, the second titled “Semweb at the BBC“, but unfortunately without transcripts of videos I can only really marvel at what might be in the works.

Patrick Sinclair (software engineer at the BBC – see his post on the Music beta) said a video might surface, but I’ve yet to find one.

By the looks of things though, there could be some fully recognised Semantic Web applications coming out of the BBC in the future. They look to discuss a handful of the languages and technologies that make up the Semantic Web stack, refer to constructing their own ontologies, reason use cases for Linked Data and look to be applying the techniques of the Music pages to Programmes sections and onward.

Look forward to it!

2 Comments

  1. Marc

    I’m glad you like what we’re doing with music, programmes and LOD in general. You might be interested in these posts:

    An article Michael and I wrote for Nodalities on our work to structure bits of bbc.co.uk: http://derivadow.com/2009/01/30/building-coherence-at-bbccouk/

    A presentation I gave on webscale identifiers and the music and programmes site: http://derivadow.com/2008/11/27/permanent-web-ids-or-making-good-web-20-citizens/

    And my presentation at XTech on the programmes ontology http://derivadow.com/2008/05/13/helping-machines-play-with-programmes-xtech-presentation/

    • Dan
    • Posted February 27, 2009 at 10:38 pm
    • Permalink

    It’s a wider project using publically available data. Here’s another set of slides around using wikipedia as a controlled vocabulary.

    http://blockslabpillar.com/?p=7


One Trackback/Pingback

  1. [...] Marc Hibbins has on his blog a bit more techie explanation of how the Beeb’s Artist Pages (in beta since July’08) [...]

Roll down the window and let the wind blow back your hair.