Category Archives: Dataportability

Last year Facebook released Facebook Connect and about the same time Google released Friend Connect, they’re two very similar services that allow users to connect with information and with their friends of the respective native platforms from third-party enabled sites. The intention, as I’ve written about before, is to add a layer of social interaction to ‘non-social’ sites, to connect your information and activity on these third-party sites to your information and activity (and contacts) on the original platforms.

Then in March, Yahoo! announced their service sign-on, called Yahoo! Updates.

Now, this week, Twitter have announced their connection service, called ‘Sign in with Twitter‘. It too gives you a secure authenticated access to your information and contacts, in exactly the same way the others do – except this time, it’s Twitter.

Sign in with Twitter

You might ask if we have three, do we need a fourth? Have you ever used any of the other three?

But don’t dismiss it, or think it Twitter are jumping on to any kind of bandwagon, Twitter’s implementation is fundamentally different to the others – and it could cause quite a stir.

The problem with the other services (ultimately the problem with the platforms) is, more than often not, they are completely closed and non-portable. Although you can sign-in to a third-party site and access your data, there’s a lot of limitation to what you can retrieve and publish. These popular social networks have grown and amassed huge amounts of members and data which they horde and keep to themselves. I’m not talking about privacy, I’m referring to data portability.

The infrastructures are like locked-in silos of information and each built differently, because, either, they never considered that you’d want to make your data portable or they didn’t then want (or see value) in you moving your data anywhere else. The services they’ve created to ‘connect’ to your data are also proprietary methods – custom built to channel in and out of those silos. Each of those services too, are singularities, they won’t work with each other.

Twitter though, have come up with a solution that adheres to agreed upon standards, specifically, by using OAuth to facilitate it’s connection. Technically, it’s significantly different, but in practice, you can expect it to do everything the others can do.

The community’s thoughts

Yahoo’s Eran Hammer-Lahav (a frequent contributor to OAuth) has written a good post discussing his thoughts, he says it’s ‘Open done right’ – no proprietary ‘special sauce’ clouds interoperability as happens with Facebook Connect. I think he’s right.

He looks at what happened when Facebook Connect was introduced, that they essentially offered third-party sites two key features: the ability to use existing Facebook accounts for their own needs, and access Facebook social data to enhance the site. The value of Facebook Connect is to save sites the need to build their own social layer. Twitter though, is not about yet another layer, but doing more with that you’ve already got.

Marshall Kirkpatrick also wrote about the announcement, his metaphor for the other ‘connection’ services best describes how they function – ‘it’s letting sites borrow the data – not setting data free’.

But then he talks about Twitter ‘as a platform’, and I think this is where things get interesting. He says:

Twitter is a fundamentally different beast.

All social networking services these days want to be “a platform” – but it’s really true for Twitter. From desktop apps to social connection analysis programs, to services that will Twitter through your account when a baby monitoring garment feels a kick in utero – there’s countless technologies being built on top of Twitter.”

He’s right. Twitter apps do pretty much anything and everything you can think of on top of Twitter, not just the primary use of sending and receiving tweets. I love all the OAuth and open standards adoption – but that’s because I’m a developer, but thinking about Twitter as a platform makes me wonder what kind of effect this will have on the users, how it could effect the climate, even landcape, of social media if, already being great, Twitter is given some real power

People have long questioned Twitter’s future – it’s business model, how it can be monetised, those things are important – but where can it otherwise go and how can it expand? Does it need to ‘expand’? It’s service is great it doesn’t need to start spouting needless extras and I don’t think it will. But in widening it’s connectivity, it’s adaptability, I think could change our perception of Twitter – it’s longevity and road map, the way we use it and think of ourselves using it.

My Thoughts

Irrelevant of Richard Madeley or Oprah Winfrey’s evangelism, Twitter is an undeniable success.

When Facebook reworked and redesigned their feed and messaging model, I almost couldn’t believe it. What was the ‘status’ updates, basically IS Twitter now, and that’s it’s backbone. It’s Twitter’s messaging model, it asks ‘What’s on your mind?’.

I’m probably not the only one who thought this, I’d guess any complaints about this being a bit of a blatant rip-off were bogged down by all the negativity about the interface redesign.

I think Facebook realised that Twitter has become a real rival. I think (and I guess Facebook also think) that as people become more web-savvy and literate to these sociable websites, they want to cleanse.

The great appeal of Twitter for me was, ingeniously, they took a tiny part of Facebook (this is how I saw it two years ago anyway) and made it their complete function – simple, short updates. Snippets of personal insight or creative wisdom, it didn’t matter really, what was important was it ignored the fuss and noise of whatever else Facebook had flying around it’s own ecology (and this was before Facebook applications came around) and took a bold single straight route through the middle of it.

Looking back, a lot of Facebook’s early adoption could be attributed to people growing restless with the noise and fuss of MySpace at the time – Facebook then was a clean and more structured an option.

I remember Twitter was almost ridiculed for basing it’s whole premise on such a minute part of Facebook’s huge machine. Now look at the turnaround.

Now people are growing up out of Web 2.0 craze. A lot went on, there was a lot of ‘buzz’, but a lot of progress was made in connecting things. People now are far more connected, but perhaps they’re over-connected, struggling from what Joseph Smarr calls ‘social media fatigue’. People they have multiple accounts in a ton of dispersed and unconnected sites around the web – true, each unique and successful for it’s own achievements – but it can’t go on.

Twitter for me is streamlined, cleansed, publishing. Whether talking about what I’m doing or finding out information from people or about topics that I follow, the 140 character limit constrains these utterances to be concise and straight-to-the-point pieces of information. The ‘@’ replies and hashtags are brilliant mechanisms conceived to create connections between people and objects where there is almost no space to do so.

I use my blog to write longer discourse, I use my Twitter to link to it. Likewise with the music I listen to, I can tweet Spotify URIs. I link to Last.fm events and anything particularly good I’ve found (and probably bookmarked with Delicious) I’ll tweet that out too.

Twitter for me is like a central nervous system for my online activities. I won’t say ‘backbone’ – because it’s not that heavy. Specifically a nervous system in the way it intricately connects my online life, spindling and extending out links, almost to itself be like a lifestream in micro.

Recently, I saw Dave Winer‘s ‘Continuous Bootstrap‘ which although is admittedly a bit of fun, describes the succession of platforms deemed social media ‘leaders’ (see the full post here).

What I initially noticed is that he aligns successful platforms – blogging, podcasting – with a single application: Twitter. It doesn’t matter whether he is actually suggesting that Twitter alone is as successful as any single publishing form, but it did make me wonder if Twitter, rather than being the current ‘holder of the baton’, will actually be the spawn for whatever kind of Web-wide platform does become popular next.

If the real Data Portability revolution is going to kick in, if it’s on the cusp of starting right now and everything will truly become networked and connected – would you rather it was your Twitter connections and voice that formed that basis for you or your Facebook profile?

I know I’d much rather read explore the connections I’ve made through Twitter. The kind of information I’d get back from the type of people who’d connect in this way would be far more relevant from my pool of Twitter connections rather than the old school friends and family members (notoriously) who’ve added me on Facebook, the kind that just add you for the sake of it.

If Web 3.0 (or whatever you want to call it) is coming soon, I’d rather detox. Twitter is slimmer and still feels fresh to start it out with. For me, Facebook feels far too heavy now, out of date and messy. Maybe I’m being unfair and I feel that way because I’ve fallen out of touch with it and now I visit less frequently, but all the negativity hasn’t done it any favours – and those complaints aren’t unfounded.

Seems I was a little late in finding out about the BBC’s work on integrating and exposing semantic data in their (then) new beta trial of Artist pages a little while ago.

In an interview with Silicon.com, Matthew Shorter, BBC’s interactive editor for music, speaks about establishing data associations with MusicBrainz, an open user-contributed ‘metadatabase’, to roll out across all of their encyclopaedic artist pages on the BBC site.

MusicBrainz has been around for some time now, it’s a huge database of music metadata storing information such as artists, their releases, song details, biographies. Right now it has information on over 400,000 artists.

As early as 2001, it was described as a ‘Semantic Web service‘ (think a Semantic Web web service), in its offering of a massive store of machine-processable, openly available information (mostly public domain or Creative Commons-licensed), available via open protocols – in RDF format no less.

The BBC have adopted this open standard, mapping their data schema with that published by MusicBrainz to utilise the unique identifiers they provide. This allows the BBC site to leverage the public domain content, augmenting the profile pages found there.

Take a look at one of the records from MusicBrainz, for example, John Lennon’s information at http://musicbrainz.org/artist/4d5447d7-c61c-4120-ba1b-d7f471d385b9.html.

The unique ID here is the MBID, ’4d5447d7-c61c-4120-ba1b-d7f471d385b9‘.

The BBC then, have a dynamically generated page at http://www.bbc.co.uk/music/artists/4d5447d7-c61c-4120-ba1b-d7f471d385b9.

Previously, writers at the BBC would have to write (and keep up to date) interesting and relevant content on every single artist pages they publish – which I’m sure you can imagine is as unenviable as impossible. Now, MusicBrainz populates a lot of the information here – see the Releases and Credits – and also provides the retrieval of the biography from Wikipedia.

At the same time, the BBC radio playout system (reportedly giant iPods in the basement of Broadcasting House) update the playlist information on the right of the page.

As Matthew Shorter says, automation and dynamic publishing means the pages can be created and maintained with a fraction of the manpower. Check the Foals page for a more recent artist and you’ll see news articles automatically aggregated also.

Gathering resources in this way and adding context around the artists enables machines to process the links between these data sets, establish relationships between the information and perform interoperation based on those.

In his article above, Tom Scott (the Technical Project Team Leader) also describes these URIs as ‘web scale identifiers’ and talks about the principles of Linked Data. Whilst in this use case these locators facilitate simple data retrieval, the notion of the absolute, global URI is a far larger idea, and here, could grow to be far more powerful.

The URIs facilitate the mechanisms, but stand to play a far larger role in opening and standardising information on the Web as a whole. The MusicBrainz MBID attempts to standardise the way we reference information online regarding music, it’s wide reuse, is in a sense, achieving that goal. But rather than thinking of these alphanumeric strings as pointing to locations of database records, they too can refer to the real world concepts they identify.

Imagine all online materials that feature a particular artist universally employing their single MBID string. Every semantically linked and annotated document and resource could be unified by an intelligent agent instructed to do so, collecting and amounting the information to describe that real world concept in it’s entirety. With consideration to the Semantic Web, ultimately, for a machine agent to understand that concept in it’s entirety.

In linking to MusicBrainz, the BBC then have equally made their data more portable to third parties wanting to use their data elsewhere. By agreeing on these unique IDs to identify resources, these pages can be automatically linked to and accessed based of this consistency.

The site provides a RESTful API, just add .xml.rdf, .json or .yaml to the end of the artist url.

The value of online information isn’t determined by scarcity like physical good are in the physical world. Reuse, repopulation and increasing visibility means, for the BBC, an enriched repository for the purposes of making information more accessible and useful to the reader (surely the inital goal), but also in having the link now established to MusicBrainz, the information is connected out into the Web, therefore enriching the source (and then exponentially any other links thereon). Better for the BBC, better for the third party, better for the reader - everything is enriched – so hopefully any later applications can benefit from this network effect.

Anyway, it turns out this has been going on since July last year, so perhaps the Silicon.com article was an attempt to increase visibility - we’re six months down the line now, after all.

If so, it worked – Sarah Perez wrote up an article at ReadWriteWeb and reports over at MusicBrainz suggest things are hotting up for this year. But if not, they should be applauded for commendable transparency and their open-minded efforts (and accept the extra drive of users to the service that comes with it!). It’s frustrating when products that are intended to ‘open up the web’ are kept closed and private for commercial purposes.

Thing is, I’m surprised I hadn’t found out about this before now. Shorter also describes this as being part of a general movement that’s going on at the BBC, “to move away from pages that are built in a variety of legacy content production systems to actually publishing data that we can use in a more dynamic way across the web.” So I went digging for more – thinking that, if this (pretty awesome) beta went online relatively quietly and the BBC aren’t particularly shouting about these new innovation (which I think they should!), perhaps there’s more elsewhere?

Well, I found two presentations over at Slideshare, the first on “BBC Programmes and Music on the Linking Open Data Cloud“, the second titled “Semweb at the BBC“, but unfortunately without transcripts of videos I can only really marvel at what might be in the works.

Patrick Sinclair (software engineer at the BBC – see his post on the Music beta) said a video might surface, but I’ve yet to find one.

By the looks of things though, there could be some fully recognised Semantic Web applications coming out of the BBC in the future. They look to discuss a handful of the languages and technologies that make up the Semantic Web stack, refer to constructing their own ontologies, reason use cases for Linked Data and look to be applying the techniques of the Music pages to Programmes sections and onward.

Look forward to it!

While I’m on the subject of data portability, I thought I’d talk about DataPortability.

A loose analogy: Consider the definition of the Semantic Web – a conceptual framework combining standardised semantic applications on the web. Similarly, the DataPortability project aims to define and implement a set of recommendations of open standards to enable (entire and complete) end-to-end portability of data.

Both ‘capitalised’ terms denote distinct, considered models – composed of specific selections of the technologies that together embody their respective namesakes.

Not that DataPortability really has anything to do with the Semantic Web other than the shared idyllic standardisation and ‘boundless’ interoperation of data and services online..

In essence the project a volunteer-based workgroup, as transparent and ‘frictionless’ a movement as the borderless experience they promote. Their vision describes the web as place where people can move easily between network services, reusing data they provide, controlling their own privacy and respecting the privacy of others (read in full here).

They wish to see end to every problem I described in my last post – the social network fatigue, the fragmentation and walled-garden silo landscape of current web platforms – and too, promote the combination of a open source technologies and protocols (including OpenID and OAuth) for web-wide benefit, not only with regards to social networking.

The following video, quite simply but accurately, describes the already too familiar picture:

So what technologies are we talking about?

Although our Semantic friends RDF, SIOC and FOAF are present, it’s much more familiar territory for the rest. The line up includes RSS, OPML, again OAuth, OpenID and Microformats. These are existing open standards though, not technologies still in development awaiting a W3C recommendation like some of the Semantic Web projections.

There’s some other very cool stuff I’d like to go into more detail with later. Definitely APML, for example – Attention Profiling Markup Language – an XML-based format that encapsulates a summary of your interests, your informed ‘attention data’.

As well as identifying the components that make up their blueprint (the recognition of how their goals can be achieved – which, and I know I keep coming back to this, is one of the largest cause for doubters of the Semantic Web – that the speculative combination of some of the technologies is almost unimaginable) – the DataPortability project also documents best practices for why you should to participate in the initiative – specifically tailored as to how they can come together for you, as developers, or consumers, or service providers etc.

DataPortability is about empowering users, aiming to grant a ‘free-flowing web’ within your control.

How are they doing this? Are they likely to succeed? They’ve already got some huge names on board – Google, Facebook, Flickr, Twitter, Digg, LinkedIn, Plaxo, Netvibes – the list goes on. This is really happening.

Find out more at dataportability.org.

Hopefully the last of the posts that I should have written last year – a while back I wrote about Facebook Connect and Google Friend Connect, I mentioned three open source data projects – OpenID, OpenSocial and OAuth.

I only mentioned them briefly in the thinking that they deserved attention separate to that topic – they’ll play a key part in the progression of social media technology, but the three are part of a bigger issue. That of data portability – one perhaps more concerned with my current Semantic Web conversation.

While the three have been separately developed over the past three (or so) years, their popularity and general implementation are becoming ever more widespread. In combination, they offer very powerful potential in leveraging data, interoperability thereof between systems and ultimately offer standardising methods and protocols in which data ‘portability’ becomes possible.

In very, (very) short:

  • OpenSocial (wiki) is a set of common APIs for web-based social network applications.
  • OpenID (wiki) is an decentralised user identification standard, allowing users to log onto many services with the same digital identity.
  • OAuth (wiki) is an protocol to simplify and standardised secure API authorisation and authentication for desktop, mobile and web applications.

 
There’s a ton of reading fired from each of those links.

But more than anything, I very strongly recommend watching the following presentation by Joseph Smarr of Plaxo, taken from Google’s I/O conference last year:

Google I/O 2008 – OpenSocial, OpenID, and OAuth: Oh, My!

He covers each of these open source building blocks in detail, collectively considering them as a palatable set of options for developers in creating social media platforms. He presents the compelling engagement they can offer social websites, how they fit together in a holistic way so developers aren’t constantly building from scratch and how he envisions the social web evolving.

He critiques that today’s platforms are essentially broken, highlighting the fragmentation of social media sites – that their rapid growth forced developers to build each platform to be built separately, from scratch so therefore differently, so that each platform has their own silo, headed in a different direction. That the very nature of social network infrastructure and architecture is still very nascent.

We are at breaking point, social media sites still assume that a every new user has never been on a social network site before. We’ve all experience having to register and re-register, upload profile information, find friends to then confirm friends – it’s not scaling any more.

Not only has it gotten to the point that we as consumers are experiencing social network fatigue, but users are also, understandably, opting out of joining even newer networks, pre-empting the nauseous motions they’ll have to repeat.

It’s very easily digestible – not at all deeply technical until the Q&A section. Do watch!

I went out for a ride and I never went back.