Category Archives: Openstandards

I’m a regular attendee of the London Flash Platform User Group and the London Flex User Group and I get notified each time a new technology group is formed or arranges an event on Meetup.com. Anyway through that I recently heard about the London Web Standards group.

The London Web Standards group meet monthly to discuss topics like creating websites, web standards, the W3C, XHTML, CSS, DOM, ECMAScript, how they impact you, your organisation, your audience and your clients.

This month they meet to discuss Joshua Porter’s ‘Designing for the Social Web‘ (Amazon) which, not having read, I would usually have passed on if it were not for the group’s organiser, Jeff Van Campen, kindly giving me a free copy.

London Web Standards Book Drop Off by otrops

So hopefully I’ll be able to rattle through it before this month’s meeting on April 14th, see the event page is here.

You can also follow ‘webstandards’ on Twitter for notices, meetup dates and agendas. Look forward to it!

Seems I was a little late in finding out about the BBC’s work on integrating and exposing semantic data in their (then) new beta trial of Artist pages a little while ago.

In an interview with Silicon.com, Matthew Shorter, BBC’s interactive editor for music, speaks about establishing data associations with MusicBrainz, an open user-contributed ‘metadatabase’, to roll out across all of their encyclopaedic artist pages on the BBC site.

MusicBrainz has been around for some time now, it’s a huge database of music metadata storing information such as artists, their releases, song details, biographies. Right now it has information on over 400,000 artists.

As early as 2001, it was described as a ‘Semantic Web service‘ (think a Semantic Web web service), in its offering of a massive store of machine-processable, openly available information (mostly public domain or Creative Commons-licensed), available via open protocols – in RDF format no less.

The BBC have adopted this open standard, mapping their data schema with that published by MusicBrainz to utilise the unique identifiers they provide. This allows the BBC site to leverage the public domain content, augmenting the profile pages found there.

Take a look at one of the records from MusicBrainz, for example, John Lennon’s information at http://musicbrainz.org/artist/4d5447d7-c61c-4120-ba1b-d7f471d385b9.html.

The unique ID here is the MBID, ’4d5447d7-c61c-4120-ba1b-d7f471d385b9‘.

The BBC then, have a dynamically generated page at http://www.bbc.co.uk/music/artists/4d5447d7-c61c-4120-ba1b-d7f471d385b9.

Previously, writers at the BBC would have to write (and keep up to date) interesting and relevant content on every single artist pages they publish – which I’m sure you can imagine is as unenviable as impossible. Now, MusicBrainz populates a lot of the information here – see the Releases and Credits – and also provides the retrieval of the biography from Wikipedia.

At the same time, the BBC radio playout system (reportedly giant iPods in the basement of Broadcasting House) update the playlist information on the right of the page.

As Matthew Shorter says, automation and dynamic publishing means the pages can be created and maintained with a fraction of the manpower. Check the Foals page for a more recent artist and you’ll see news articles automatically aggregated also.

Gathering resources in this way and adding context around the artists enables machines to process the links between these data sets, establish relationships between the information and perform interoperation based on those.

In his article above, Tom Scott (the Technical Project Team Leader) also describes these URIs as ‘web scale identifiers’ and talks about the principles of Linked Data. Whilst in this use case these locators facilitate simple data retrieval, the notion of the absolute, global URI is a far larger idea, and here, could grow to be far more powerful.

The URIs facilitate the mechanisms, but stand to play a far larger role in opening and standardising information on the Web as a whole. The MusicBrainz MBID attempts to standardise the way we reference information online regarding music, it’s wide reuse, is in a sense, achieving that goal. But rather than thinking of these alphanumeric strings as pointing to locations of database records, they too can refer to the real world concepts they identify.

Imagine all online materials that feature a particular artist universally employing their single MBID string. Every semantically linked and annotated document and resource could be unified by an intelligent agent instructed to do so, collecting and amounting the information to describe that real world concept in it’s entirety. With consideration to the Semantic Web, ultimately, for a machine agent to understand that concept in it’s entirety.

In linking to MusicBrainz, the BBC then have equally made their data more portable to third parties wanting to use their data elsewhere. By agreeing on these unique IDs to identify resources, these pages can be automatically linked to and accessed based of this consistency.

The site provides a RESTful API, just add .xml.rdf, .json or .yaml to the end of the artist url.

The value of online information isn’t determined by scarcity like physical good are in the physical world. Reuse, repopulation and increasing visibility means, for the BBC, an enriched repository for the purposes of making information more accessible and useful to the reader (surely the inital goal), but also in having the link now established to MusicBrainz, the information is connected out into the Web, therefore enriching the source (and then exponentially any other links thereon). Better for the BBC, better for the third party, better for the reader - everything is enriched – so hopefully any later applications can benefit from this network effect.

Anyway, it turns out this has been going on since July last year, so perhaps the Silicon.com article was an attempt to increase visibility - we’re six months down the line now, after all.

If so, it worked – Sarah Perez wrote up an article at ReadWriteWeb and reports over at MusicBrainz suggest things are hotting up for this year. But if not, they should be applauded for commendable transparency and their open-minded efforts (and accept the extra drive of users to the service that comes with it!). It’s frustrating when products that are intended to ‘open up the web’ are kept closed and private for commercial purposes.

Thing is, I’m surprised I hadn’t found out about this before now. Shorter also describes this as being part of a general movement that’s going on at the BBC, “to move away from pages that are built in a variety of legacy content production systems to actually publishing data that we can use in a more dynamic way across the web.” So I went digging for more – thinking that, if this (pretty awesome) beta went online relatively quietly and the BBC aren’t particularly shouting about these new innovation (which I think they should!), perhaps there’s more elsewhere?

Well, I found two presentations over at Slideshare, the first on “BBC Programmes and Music on the Linking Open Data Cloud“, the second titled “Semweb at the BBC“, but unfortunately without transcripts of videos I can only really marvel at what might be in the works.

Patrick Sinclair (software engineer at the BBC – see his post on the Music beta) said a video might surface, but I’ve yet to find one.

By the looks of things though, there could be some fully recognised Semantic Web applications coming out of the BBC in the future. They look to discuss a handful of the languages and technologies that make up the Semantic Web stack, refer to constructing their own ontologies, reason use cases for Linked Data and look to be applying the techniques of the Music pages to Programmes sections and onward.

Look forward to it!

While I’m on the subject of data portability, I thought I’d talk about DataPortability.

A loose analogy: Consider the definition of the Semantic Web – a conceptual framework combining standardised semantic applications on the web. Similarly, the DataPortability project aims to define and implement a set of recommendations of open standards to enable (entire and complete) end-to-end portability of data.

Both ‘capitalised’ terms denote distinct, considered models – composed of specific selections of the technologies that together embody their respective namesakes.

Not that DataPortability really has anything to do with the Semantic Web other than the shared idyllic standardisation and ‘boundless’ interoperation of data and services online..

In essence the project a volunteer-based workgroup, as transparent and ‘frictionless’ a movement as the borderless experience they promote. Their vision describes the web as place where people can move easily between network services, reusing data they provide, controlling their own privacy and respecting the privacy of others (read in full here).

They wish to see end to every problem I described in my last post – the social network fatigue, the fragmentation and walled-garden silo landscape of current web platforms – and too, promote the combination of a open source technologies and protocols (including OpenID and OAuth) for web-wide benefit, not only with regards to social networking.

The following video, quite simply but accurately, describes the already too familiar picture:

So what technologies are we talking about?

Although our Semantic friends RDF, SIOC and FOAF are present, it’s much more familiar territory for the rest. The line up includes RSS, OPML, again OAuth, OpenID and Microformats. These are existing open standards though, not technologies still in development awaiting a W3C recommendation like some of the Semantic Web projections.

There’s some other very cool stuff I’d like to go into more detail with later. Definitely APML, for example – Attention Profiling Markup Language – an XML-based format that encapsulates a summary of your interests, your informed ‘attention data’.

As well as identifying the components that make up their blueprint (the recognition of how their goals can be achieved – which, and I know I keep coming back to this, is one of the largest cause for doubters of the Semantic Web – that the speculative combination of some of the technologies is almost unimaginable) – the DataPortability project also documents best practices for why you should to participate in the initiative – specifically tailored as to how they can come together for you, as developers, or consumers, or service providers etc.

DataPortability is about empowering users, aiming to grant a ‘free-flowing web’ within your control.

How are they doing this? Are they likely to succeed? They’ve already got some huge names on board – Google, Facebook, Flickr, Twitter, Digg, LinkedIn, Plaxo, Netvibes – the list goes on. This is really happening.

Find out more at dataportability.org.

Hopefully the last of the posts that I should have written last year – a while back I wrote about Facebook Connect and Google Friend Connect, I mentioned three open source data projects – OpenID, OpenSocial and OAuth.

I only mentioned them briefly in the thinking that they deserved attention separate to that topic – they’ll play a key part in the progression of social media technology, but the three are part of a bigger issue. That of data portability – one perhaps more concerned with my current Semantic Web conversation.

While the three have been separately developed over the past three (or so) years, their popularity and general implementation are becoming ever more widespread. In combination, they offer very powerful potential in leveraging data, interoperability thereof between systems and ultimately offer standardising methods and protocols in which data ‘portability’ becomes possible.

In very, (very) short:

  • OpenSocial (wiki) is a set of common APIs for web-based social network applications.
  • OpenID (wiki) is an decentralised user identification standard, allowing users to log onto many services with the same digital identity.
  • OAuth (wiki) is an protocol to simplify and standardised secure API authorisation and authentication for desktop, mobile and web applications.

 
There’s a ton of reading fired from each of those links.

But more than anything, I very strongly recommend watching the following presentation by Joseph Smarr of Plaxo, taken from Google’s I/O conference last year:

Google I/O 2008 – OpenSocial, OpenID, and OAuth: Oh, My!

He covers each of these open source building blocks in detail, collectively considering them as a palatable set of options for developers in creating social media platforms. He presents the compelling engagement they can offer social websites, how they fit together in a holistic way so developers aren’t constantly building from scratch and how he envisions the social web evolving.

He critiques that today’s platforms are essentially broken, highlighting the fragmentation of social media sites – that their rapid growth forced developers to build each platform to be built separately, from scratch so therefore differently, so that each platform has their own silo, headed in a different direction. That the very nature of social network infrastructure and architecture is still very nascent.

We are at breaking point, social media sites still assume that a every new user has never been on a social network site before. We’ve all experience having to register and re-register, upload profile information, find friends to then confirm friends – it’s not scaling any more.

Not only has it gotten to the point that we as consumers are experiencing social network fatigue, but users are also, understandably, opting out of joining even newer networks, pre-empting the nauseous motions they’ll have to repeat.

It’s very easily digestible – not at all deeply technical until the Q&A section. Do watch!

You can’t start a fire without a spark.