Category Archives: Openstandards

Inspired by Mark Birbeck’s talk on RDFa and the Semantic Web earlier this month I decided to take some of my own advice and add RDFa to my site. I’ve now created a FOAF profile here on my blog.

Reading through Mark’s articles at webBackplane, I noticed he has a very simple tutorial on how to create a basic FOAF profile. Being so straightforward, as RDFa is meant to be, and seeing how I wrote about FOAF in my dissertation almost three years ago now, as well as now having full control of my blog (now that it’s a WordPress.org installation rather than the free option) – I’ve no excuse.

In my last post I discussed RDF vocabularies, sets of agreed-upon and unambiguous terms that allow developers to structure otherwise ‘mundane’ structureless content by inserting definitions and making references to which machines and applications can follow to infer meaning and understand topics of any kind. There I gave an example that pointed to a specification used to structure book reviews.

FOAF is the Friend of a Friend vocabulary, an RDF and OWL ontology, used in the same way, but specificially to describe people, their activities, interests and their relationships to other people.

Created around mid-2000, it is now maintained by The FOAF Project, which is considered to be one of the earliest – if not the first – Semantic Web application.

The idea is that given that anyone can put RDF(a) on their Web site, equally, anyone can describe themselves using FOAF. By doing this they are creating themselves a FOAF profile, thus have joined the linked network of people who have already done the same. These people can then begin to create links to each others FOAF profiles and start to create a social network of friends, without the need for a centralised database or website to do it for them.

Then here’s where RDFa steps in, which now allows developers to implement structured data straight into their HTML mark-up, wrapping existing or new content with an extended set of attributes, meaning they no longer have to host a separate RDF file and rely on either applications indexing that file or creating a link to it from another page to point to it.

Creating a FOAF profile

I already have an ‘About’ page on this blog – a bit of blurb about who I am and what I do. So it’s here that I’m implementing my FOAF information.

As said, there’s no need to link to a separate RDF file if you use RDFa, so really you can add the metadata anywhere, in your headers or footers for example, but that About page is the most relevant place for me and already contains the information and links I want to share anyway.

Firstly I wrap the text in a div tag that defines the FOAF namespace and declares that this div is a ‘Person object’, that the contents of this div describes a person. This is done by referring to the foaf:Person type of the FOAF vocabulary:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is Marc Hibbins.</p>

</div>

I also use the about attribute with the value #me which is a useful convention to easily enable people to create links to me, more on this later.

The FOAF Person object contains a lot of properties to describe you personally, what kinds of activities you are involved in and terms that create connections to sites or documents relating to you.

Now that my object is created I can start annotating the text with some of these terms, for example my name:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>

</div>

And then some links, FOAF has terms for to define your blog and homepage URLs:

<a rel=”foaf:weblog” href=”http://blog.marchibbins.com/”>My blog</a>
<a rel=”foaf:homepage” href=”http://www.marchibbins.com”>My site</a>

It’s also common to have an image, so likewise if I had I would attach the foaf:img term. The full A-Z index of terms can be found on the specification: http://xmlns.com/foaf/spec/.

FOAF allows you to connect to other online accounts that you own. Mark’s tutorial has the following example to attach his Twitter account to his person object:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
<span property=”foaf:accountName” >markbirbeck</span>
</span>
</span>

The foaf:holdsAccount definition creates a relationship between the typeof=”foaf:Person” object and the typeof=”foaf:OnlineAccount” object that follows (the above mark-up would be contained within said Person object). Note the foaf:holdsAccount span allows for multiple foaf:OnlineAccount objects inside. The foaf:accountServiceHomepage term defines the service homepage, Twitter’s home page in this case and the foaf:accountName property declares Mark’s username.

As you’ll notice (as he does, too) although its machine-readable it isn’t particularly human-readable. Well it is, but it’s not all that nice. So instead he uses this formatting:

His inane comments are available on his
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
account. His ID is ‘
<span property=”foaf:accountName”>markbirbeck</span>
‘.
</span>
</span>

This is better for human reading, though I still think a little convoluted. All I want is a single link, just like I already have. As I’ve said, RDFa should have no effect on my content – its workings should be hidden to the reader.

So in my mark-up, rather than the rel attribute on a tags and using in-line values (values found immediately between the a tags), I use the property and content attributes on spans:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName” content=”marchibbins”>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web…
</span>
</span>
</span>

This allows me to keep my existing prose and is still machine-accessible.

I mentioned being able to connect to other people’s FOAF profiles, this is done by attaching the foaf:knows term to a link to someone’s similar such page:

<a rel=”foaf:knows” href=”http://www.w3.org/People/Ivan/#me“>Ivan Herman</a>

Note here that Ivan Herman has employed the #me mechanism in his FOAF URI to connect directly to his profile information, rather than to the whole page which contains that information.

I’ve decided not to connect to friends or colleagues here in this way, again it wasn’t in my original content and also I use a similar technology instead, called XFN, in the footer of my blog pages. XFN deserves a blog post to itself (that hopefully I’ll get time for), have a look a the source and you’ll see similar rel attributes there for now.

My FOAF profile

So here it is, abridged but with all the RDFa shown:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>

<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>
<p>I’m an interactive and digital media developer, I build Web applications and
RIAs primarily with tools like Flash, Flex and AIR.</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web,
gathered as dissertation research…
</span>
</span>
</span>
</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://friendfeed.com/”
>
<span property=”foaf:accountName” content=”hibbins”>
I use <a href=”http://friendfeed.com/hibbins”>FriendFeed</a>
</span>
</span>,

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://twitter.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://twitter.com/marchibbins”>Twitter</a>
</span>
</span> and

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://last.fm/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://www.last.fm/user/marchibbins”>Last.fm</a>
</span>
</span> etc etc..
</span>
</p>

<p>
<a rel=”foaf:homepage”
href=”http://www.marchibbins.com”>www.marchibbins.com</a>
</p>
</div>

Notice that I’m actually using two foaf:holdsAccount blocks – you can, of course, contain all foaf:OnlineAccount objects within a single relation, but it seems that WordPress won’t allow me to do so. When I hit return to start a new paragraph it automatically closes the wrapping span and p and starts a new paragraph – so I’ve had to use two. Otherwise the p tags would be inside the span (rather than the other way round) but again, the MCE doesn’t show p tags in order for me to edit them in that way.

Similarly, WordPress will wipe clean all your span tags if you switch from HTML to Visual mode – so watch out for that. It also doesn’t output nice, clean indented HTML in the final page, which is a shame.

Find the full version here.

Validating RDFa

How do you know that any of your metadata is correct – that it is machine-readable?

I took Mark Birbeck’s recommendation and used the Ubiquity parser bookmarklet to validate my RDFa. Simply publish your mark-up and hit the button and you’ll see what an RDFa parser sees.

Hopefully it is in fact all correct. I wasn’t too sure if multiple foaf:holdsAccount blocks would be acceptable, but the Ubiquity parser shows the same results nethertheless – likewise with my use of property and content spans over rel attributes. That said, if anyone has opinions otherwise – let me know!

On Monday I attended Mark Birbeck‘s seminar The Possibilities of RDFa and the Semantic Web, an ‘in-the-brain’ session on RDFa and structured data discussing some of the exciting possibilities created by authors implementing structured data via RDFa into Web pages.

We looked at some existing implementations, how to enhance very simple Web pages to be machine-’readable’ and looked at RDFa as part of the bigger picture – in correspondence to RDF, Microformats and other technologies, and the Semantic Web.

RDFa (Resource Description Framework in attributes) is a set of extensions to XHTML, a set of attributes intended to be implemented by Web page publishers to introduce extra ‘structure’ to the information in documents. These attributes are added at author-time and are completely invisible to the end-user. RDFa is essentially metadata, additional machine-readable indicators that describe the information they annotate without intruding on existing mark-up.

The W3C’s RDFa Primer is a good introduction for those unfamiliar with the technology and it conveys the current problem of the separation between the information on the Web readable by humans and that ‘readable’ by machine – the data Web.

Today’s Web is built predominantly for human consumption. Even as machine-readable data begins to appear on the Web it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, Web browsers can provide only minimal assistance to humans in parsing and processing data: browsers only see presentation information.

On a typical Web page, an XHTML author might specify a headline, then a smaller sub-headline, a block of italicised text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands that the headline is, in fact, the blog post title, the sub-headline indicates the author, the italicised text is the article’s publication date, and the single-word links are categorisation labels.

Presentation vs Semantics

The gap between what programs and humans understand is large. But when Web data meant for humans is augmented with machine-readable hints meant for the computer programs that look for them, these programs become significantly more helpful because they can begin to understand the data’s structure and meaning.

RDFa is getting a lot of attention lately, not least with developments of HTML5 and the consideration of its implementation with the specification. As of October 2008, RDFa in XHTML is a W3C Recommendation, but it was Mark Birkbeck who was first to propose RDFa in a note to the W3C in February 2004 – so I was looking forward to this.

Publishing information and publishing data

Mark started with a modern use-case for RDFa, showing us how we can and why we should put structured data in our sites.

He pointed out that online publishing has never been easier. There are a huge number of blogging services offering free, easy to set-up and maintain platforms that are long established and now with sites like Twitter offering the ability to update via SMS or from a mobile device, the Web is enabling a level of ubiquitous publishing that we’re now completely familiar with.

This is us publishing information, human-readable content. But he says that publishing data, however, still remains hard.

By data, he refers to information ‘without prose’ – think listings on eBay or Yelp, where numbers and stats are what matter over articles of text. He suggests that places like eBay and Yelp are the only places where publishing data is easy. But by using these sites, especially those where the majority of content is user-generated, raises questions as to who actually owns the data.

As a result, people have begun to make their own sites, for maintain their own data. That sounds easy enough, but the problem is that the most accessible means to make websites nowadays is to use the blogging platforms that I mentioned earlier – those that aren’t really meant for publishing data, they’re meant for publishing information.

Mark showed some examples where this is happening, WordPress and Blogger type sites set up by individuals for reviewing films, or books, or restaurants etc – where each blog ‘post’ was an entry for an individual film or book, and details like scores and ratings (the valuable data) is lumped together in the article body – because that’s all it can really do – it’s meant for prose.

Of course, that’s fine, for their own sake. They have their content on the Web, on a domain they control. But often this content is of real value, its useful for more than just the author and other people should be see it.

Often though without being on sites such as eBay or Yelp, users will infrequently find them. This is because the norm for finding content is by way of a search engine, i.e. using a machine to find human information. True, some independent sites do rank highly in search engines, but for any newly launched site it is near impossible.

So enter RDFa. By using RDFa, authors can package their existing content in additional mark-up so machines can discriminate between what parts of a long body of text refers specifically to valuable discreet details. Authors can target and isolate individual pieces of text and associate these with a predefined vocabulary of terms specifically for the relevant context of the subject data.

An example

The following is Mark’s example blog post, some typical mark-up for a book review:

<div>
<span>
Title: <span>Chanda’s Secrets</span>
</span>

Stars: <span>*****</span>

<span>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

This is very simple mark-up, perfectly understandable for human consumption, but offers machines agents zero indication as to what the content refers to.

RDF is Resource Description Framework, it’s purpose is to describe. To do so, it references vocabularies of terms – definitions that provide agreed-upon tags for developers to use that indicate unambiguous details. Machine applications finding this data can then interpret these as they choose, matching the unique identifying terms to the same vocabulary in their process as reference to exactly what was intended, to infer meaning.

There is a vocabulary for book reviews, for example. The following code takes the mark-up above and adds RDFa, enriching the data for the machines ‘viewpoint’, while having no impact on its presentation – the human viewpoint:

<div xmlns:v=”http://rdf.data-vocabulary.org/#” typeof=”v:Review”>
<span rel=”v:itemreviewed”>
Title: <span property=”v:name”>Chanda’s Secrets</span>
</span>

Stars: <span property=”v:rating” content=”5″>*****</span>

<span property=”v:summary”>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

A little run through of the code – the first line adds a reference to the RDFa vocabulary to use, defining a XML namespace, v, and declares that everything within this div is a review (type v:Review). We add v:itemreviewed to the title and specify its v:name explicitly. This review class has a rating property, so in the same way we add the v:rating property to the five stars. Here you can see another example of a lack of machine/human understanding – we can see five asterisks (they aren’t even five stars) and work out what they refer to – but a machine has no idea. So we ‘overwrite’ the inline text with the content attribute, declaring outright that our rating has a value of 5. The summary is tagged in the same way then.

Now it’s not like this means the post would jump straight to the top of Google’s search results. RDFa isn’t about solving that problem, but it helps. It allows machines to infer meaning and act accordingly, if the application is capable of doing so.

For example, any search engine could probably find this post as by matching the search term ‘Chanda’s Secrets’ in a standard lookup query. But recently Google added RDFa support and Yahoo! have been doing it for over a year, and both sites now have specific applications for parsing RDFa on top of their normal workings.

Google’s is called Rich Snippets, which returns this kind of result:

A Google 'Rich Snippet'

It’s an enhanced search result that can feature context-relevant details alongside the link, in this case a star-rating and price range indicator. It’s with this kind of application you can begin to see how adding structured data can make powerful improvements to your site.

Writing a review and having everything lumped together in the body of an article is fine for humans reading the content, they can make sense of the words, numbers and pictures themselves, but a machine can’t. By annotating your review – indicating, for example, a restaurant rating or the price of your food – you make your data machine-processable, so also able to be aggregated with others. This is called data distribution. In this way, you also still ‘own’ your content because it originates from your site, but it is as computable as those larger sites and databases.

More possibilities

As well as being used as a means to structure data in a desired format, RDFa can also be used to standardise data, again for interoperability and processing.

Mark presented some job aggregation sites that demonstrate this. Each site was made with a different technology (ASP .NET, PHP, static pages – whatever) and each displayed data in a different way, not only in their presentation but also by using different terminology – a job ‘title’ or ‘name’ or ‘position’, for example. RDFa indicators standardise these ambiguities, so third-party sites, such as aggregators, can intelligently collate the data with an understanding of the differences.

He also spoke about joining the linked data cloud, showing sites that could use the unique identifiers of a vocabulary with reference to their own content for an opportunity to enrich their pages. He had a nifty example that populated a book review page by using the relevant class and retrieving the book’s cover from a external source that had tagged their data in the same way. That match ensured the discovered data would be relevant.

Finally he talked about vertical search. Where RDFa, at least at it’s most rudimentary, can be used to work with ambiguous words, synonyms and such. A search for a chemical symbol, in traditional lookup search, would not take into account it’s chemical name or any other term that refers correctly to it, albeit by a different word. With an application that converts said terms to an agreed upon identifier, for example it’s chemical formula, users could retrieve all references, by whatever name, tagged also with that identifier.

Q&A

The Q&A to close the seminar was also really useful. Here Mark talked about Microformats, in comparison to RDFa, something I was waiting for. He pointed out that Microformats each require their own parser, need references to a separate vocabulary for each microformat used and commented on the standard being centrally maintained by a handful of people. RDF is decentralised, there is no governing body – that anyone can create a vocabulary (though of course that might not always be the best answer), and really made Microformats seem clunky and second-best to RDFa – an opinion I hadn’t held before.

Mark also pointed out that RDFa is arguably more extensible in that can relate to references not on the actual page of implementation. For example on a page where a license is being specified, with RDFa and by using the about tag, users can refer to objects linking from their page – such as listing different licenses for a set of external images or videos, rather than only being able to describe the license for the page itself, the page that these images link from.

Eventually there was some more Semantic Web talk, discussing technologies such as RDF, OWL and ontologies – but not as much as I’d hoped.

In fact, a lot of the content of the talk can be found in Mark’s excellent articles written for A List Apart, his Introduction to RDFa and Introduction to RDFa, Part II.

I highly recommend reading these, he explores the different methods of annotating your pages, how to apply attributes to various kinds of elements, discusses vocabularies, rules and how to express relations – a great introduction.

Overall, a great talk. He’s also now uploaded his slides, with a video recording of the session pending.

Here they are:

Last week I attended a YDN Tuesday, a developer talk hosted by the Yahoo! Developer Network led by Dirk Ginader, discussing Web Accessibility.

It looks as if these presentations have been running for a while now and they’ve got a good schedule lined up for the coming months. They discuss a decent section of Web development beyond the pure skills – JS, DOM, PHP, OAuth, Web services, Yahoo! technologies and by the looks of things have AJAX, Flex and Ruby on Rails in the pipeline.

They’re also free, which is great when you’re sitting down to hear Yahoo! experts talk about what they do best!

Dirk Ginader is part of the Accessibility Task Force at Yahoo! and tackled developing fully accessible Web applications at every level – covering basic markup, best practices with CSS and accessible Javascript, finishing with a discussion on WAI-ARIA, offering some of his insight gained from working with the standard.

Most people are familiar with the common three-layer development of Web sites, building a core HTML structure, styling with CSS and enhancing functionality with Javascript. In his talk though, Dirk introduced a five-layer development method and spoke about this throughout the sessions.

Dirk Ginader's 5-layer Web development

Building on top of the common three-layer method, Dirk spoke of adding levels of ‘CSS for Javascript’, i.e. adding extra CSS if Javascript is available and enhancing the interface to promote this – and a final layer of WAI-ARIA, the W3C standard for accessible rich Internet applications.

The core layers – HTML, CSS, JS

First though Dirk went into the basics, giving a good exploration of the first shared three layers – reiterating the importance of good, clean HTML, appropriate and logical tab ordering, form building and that it should, obviously, be usable without CSS and Javascript.

Again he reiterated the importance of dividing CSS and Javascript, simply, as it always should be, that CSS is for styling and Javascript is for interaction. CSS can be used achieve a lot of interactivity functionality that would otherwise be controlled by Javascript, but these are akin to hacks, says Dirk.

Another accessibility oversight is the assumption that all users have a mouse or pointing device – and as such, all design is appropriated for mouse control. If your mark-up is good and each ‘level’ of your development has been tested and is robust, your site should be completely navigable with just the Tab and Enter keys. Also, any CSS that uses the mouse-only :hover effects, should also use :focus, which includes active tabbing.

I always feel that approaching Web development in view to adhere to strict standards and to maintain accessibility always helps produce cleaner code and generally minimise errors and cross-browser inconsistencies in the long run anyway.

Dirk spoke about the usefulness of the focus() Javascript function, in bringing users’ attention to alerts, changes, altered screen states and such – especially handy for users with screen readers or screen magnifiers.

On the subject of screen readers, Dirk spoke about how they really work, how they see a Web page and handle the content – talking about loading systems and various reading modes. This was great becausue although I’ve designed for screen readers before, I’ve never seen one being used or had a go myself – and I’m sure I’m not the only one.

CSS for Javascript

The first extra level of Dirk’s five-layer development is in adding CSS when Javascript is available. This means your interface can be altered knowing that Javascript can be used.

You can use Javascript to append an additional class to page elements so that you can use CSS to target and style them. For example, the following line of code adds a class named ‘js’:

document.documentElement.className += ” js”;

You would then style with the follow CSS, where the first declaration is global and the second applied only if Javascript has been found and appended said ‘js’ class to an element:

.module {
/* Both layouts */
}
.js .module {
/* Javascript layout */
}

Enhancing a page in this way isn’t anything new, but it is very cool.

If you’ve heard of the term Progressive Enhancement, then you’ll know why. If you’ve not, you may have heard of Graceful Degradation. Both are methods for handling differences in browser rendering and capability, they’re similar but subtly different.

Graceful degradation, or ‘fault tolerance’, is the controlled handling of single or multiple errors – that when components are at fault, content is not compromised. In developing for the Web, it means focusing on building for the most advanced and capable browsers and dealing with the older, second.

Progressive enhancement turns this idea on it’s head, focusing on building a functioning core and enhancing the design, where possible, when capable.

There are a good few articles on A List Apart about that I strongly recommend bookmarking:

 

The last article, Scott Jehl discusses enhancement with both CSS and Javascript and has a similar trick of appending class names to page objects once Javascript has been detected. He talks about how to test those capabilities and offers a testing script, testUserDevice.js, which runs a number of tests and returns a ‘score’ for your interpretation. As well as offering an amount of detail on top of the alone recognition of Javascript, it even stores the results in a Javascript cookie so the tests don’t have to be run on every page load.

WAI-ARIA

The final layer of hotness is WAI-ARIA, the W3C standard, the recommendation for making today’s RIA and heavily client-scripted Web sites accessible.

WAI-ARIA adds semantic metadata to HTML tags, allowing the developer to add descriptions to page elements to define their roles. For example, declaring that an element is a ‘button’ or a ‘menuitem’. In Dirk’s words, it maps existing and well known OS concepts to custom elements in the browser.

As well as page elements, page sections or ‘landmarks’ can be described too – declaring, for example, a page’s ‘navigation’, ‘banner’ or ‘search’ box – these look very similar to HTML 5.

There’s a great introduction on the Opera Developers site.

WAI-ARIA is not something I’ve used before, but looking at the specification there seems to be a lot of definitions you can add into you code – it looks very extensive.

Although it is ready to use, it will apparently invalidate your code unless you specify a specific DTD. Dirk didn’t actually point to where you can find these, though I’ve seen that you can make them yourself, however I don’t know if this is what he was suggesting.

Dirk has also uploaded his slides onto Slideshare, including the code I’ve used above, you can see them here:

It’s been a good while since Twitter added their OAuth beta phase, I wanted to write about it when it first came about but never had the chance – same story with when they were under fire from phishing attacks in January and the real need for stronger security became so apparent. Anyway what with the recent ‘Sign in with Twitter’ announcement, which enhances the OAuth beta, I thought I’d use this is my excuse to say what I wanted to say.

If you’re unfamiliar with OAuth, it’s an open protocol standard for user authentication. It works by allowing a user of one platform to grant a secondary platform access to their information (stored by the first platform) without sharing their login credentials and only exposing data they choose.

When a user visits a ‘consuming platform’ (the secondary application, that is) it passes a request to the native platform, the ‘service provider’, which returns a login request for the user to complete. The user then logs in to the native platform, proceeds to inform the platform to grant access to their data when the secondary application asks for it – and is then returned to that consuming platform, ‘logged in’ and ready to go.

The crucial problem with Twitter’s API is that, currently, to access password protected services, for example the ability to publish tweets, this is not the mechanism facilitating the data access. The method they use instead is seriously flawed, and dangerous.

Right now, a website or desktop application such as TweetDeck or Twitpic, simply asks for your login details with a regular login prompt. I think from that point onwards, there is a huge amount of misunderstanding to what is actually going on.

Users are not logging in to Twitter at this point, instead they are just telling the third-party application what their password is. Thereafter, the application uses that password as it chooses.

Instead of telling Twitter that you’d like a certain application to access your data, you are instead freely handing over your password to the application, you hope in confidence, that it won’t be stored, sold or misused thereafter.

Incredibly, this has gone on for a very long time. It seems the general majority of Twitter users have come to accept handing out their password to completely unknown sources. True, those aware of the dangers or generally security-wary tend only to use a select few services, but there are so many applications built on Twitter’s platform and a lot of them offer very niche, almost ‘throwaway’ services, that I can’t believe the ease and almost disdain with which so many hand out their login credentials without concern.

OK – it’s not like it’s your giving away your online banking details, but I can’t imagine this happening with any other type of account – social media or otherwise – email, Facebook or any other website, if they offered an open platform for these kind of applications to be built upon.

It’s become as increasingly accepted as the request has become more common. The problem with there being so many applications, especially the ‘disposable’ kind, is that users can forget when and where they have given their details to whom.

Say a user tries a new application but it seems not to work, it will be easily forgotten – perhaps put down to teething problems of the app or maybe it’s just not a very good app and “..nevermind”, they might not have been that interested anyway. By this point, if it was purely an attempt to capture your details, it’s too late.

Admittedly, and thankfully, I’ve never heard of anything so blatant and I hope if anything so obvious came around that the Twitter community would raise awareness and Twitter would respond accordingly.

But of course, the real targets for these vulnerabilities are the users who aren’t aware of the danger and aren’t expecting to have to look out for fishy, or phishy, sites – and the problem is informing those people.

If you’re reading this blog – that being a Web development blog and you’ve sat this far reading a post about user authentication – chances are you’re Web-savvy and you’re exactly not the type of person I’m talking about. You’re probably also not the kind of person who reuses passwords, but you also know that’s not uncommon.

In a scenario where a password is breached, if the email account that you’ve registered with Twitter uses the same password as the password you’ve just lost to the phishing attack, there would be no question that an attacker wouldn’t try the same password with every other account you’re receiving email from and connected to.

Then that becomes a serious breach.

But like I say, I think I’m preaching to the choir – and maybe being a bit harsh about people’s common sense.

Twitter and OAuth

Anyway I wanted to talk about OAuth. Twitter’s implementation is described on their wiki page for ‘Sign in with Twitter’, it performs accordingly:

Sign in with Twitter workflow

  • If the user is logged into Twitter.com and has already approved the calling application, the user will be immediately authenticated and returned to the callback URL.
     
  • If the user is not logged into Twitter.com and has already approved the calling application, the user will be prompted to login to Twitter.com then will be immediately authenticated and returned to the callback URL.
     
  • If the user is logged into Twitter.com and has not already approved the calling application, the OAuth authorisation prompt will be presented. Authorising users will then be redirected to the callback URL.
     
  • If the user is not logged into Twitter.com and has not already approved the calling application, the user will be prompted to login to Twitter.com then will be presented the authorisation prompt before redirecting back to the callback URL.
     

You may have already seen it in action if you’ve used Kevin Rose’s new startup WeFollow, the ‘user powered Twitter directory’. You can see which applications (if any) you’ve granted access to in your account settings at http://twitter.com/account/connections.

Flickr also uses OAuth, you may have seen it there if you’ve tried uploading images with a third-party application.

Aside from being more secure as a technical solution, Twitter’s adoption of OAuth could have a very positive domino effect on similar and future applications. In fact, it’s been predicted that it’ll ‘pave the way’ for a whole host of new apps and more mash-ups to come – presumably using both Twitter’s data or for new platforms to be built upon. I imagine this prediction sees a point where users are familiar with the authentication process, confident that their data can be accessed securely and within their control.

As I said in my post about ‘Sign in with Twitter’ – Twitter is an incredible tool and is becoming ever more powerful and recognised as such. Although it’s not like it’s popularity won’t increase anyway, but if some people’s qualms and easy criticisms of Twitter, of which security always scores highly, are solved by these kind of platform advances, there will be no denying it as a leader, rather than a contender, in the social Web landscape.

It must be said though, OAuth isn’t infallible. Only two weeks ago, Twitter took down their OAuth support in response to the uncovering of a vulnerability, though they weren’t the only ones affected.

And then there’s phishing..

I mentioned the phishing attacks that Twitter suffered in January – thirty-three ‘high-profile’ Twitter accounts were phished and hacked. It saw a good effort on Twitter’s part for reacting quickly and fixing the problems, only two days prior they had released a notification to be aware of such scams.

During this time, Twitter was a great source for debate and argument over how to resolve its own issues.

I follow a lot of developers and platform evangelists including Alex Payne, Twitter’s API Lead, as he battled through the security breach. Another is Aral Balkan and between the two of them they voiced some fair criticisms (1, 2, 3) and argued out a lot of issues (1, 2).

As Alex says, OAuth does not prevent phishing, Twitter are aware of this. The very premise of phishing, that of dressing a trap as a legitimate and trusted source, can be extended to OAuth implementations, too – but it does make it easier to handle and by using OAuth, instead of Basic HTTP Auth, builds user trust along the way.

Up until now, Basic Auth has been a large part of Twitter’s API success – OAuth is an additional high hurdle for new developers. Twitter admit, they’ll give at least 6 months lead time before making any policy changes and they’ve no plans in the near term to require OAuth.

Alex did a good job of pointing out helpful resources and blog posts for those joining the debate. One was Jon Crosby’s post about the phishing attacks, which, as he says, is a great explanation of the correlation of OAuth to phishing attacks – which is to say, essentially none. It’s short post but clearly outlines the difference between authentication and authorisation – and in Alex posting it, shows Twitter’s awareness of the problem and understanding of what OAuth is and is not.

Another was Lachlan Hardy’s post about phishing (via), which extends Jeremy Keith’s proposed ‘password anti-pattern’. Keith thinks that accessing data by requesting login credentials is unacceptable, a cheap execution of a bad design practice. But interestingly he goes on to talk about the moral and ethical problems that developers experience – that although users will willingly give out their passwords and Basic Auth is an easier process to implement, as well as being a lower barrier of entry for users (again, look at Twitter’s success with it), we actually have a duty not to deceive them into thinking that it is acceptable behaviour.

Keith also talks about the pressure of the client, their need to add value to their applications ‘even when we know that the long-term effect is corrosive’ – but when I read that, posted from Alex remember, and having read his thoughts on security from his own blog, I wonder if Alex is hinting at something about Twitter outside of his control..

He is the only Twitter employee I follow so I tend to think of him the representative, but probably should think of them separately. Aral’s post about the phishing scam points the blame squarely at ‘Twitter’, but only in the last paragraph does he say so ‘stop blaming application developers’ – and at that point I realise the devs at Twitter are just trying to do their jobs.

Actually, I’ve just noticed Marshall Kirkpatrick’s article ‘How the OAuth Security Battle Was Won, Open Web Style‘ at ReadWriteWeb, it talks about the down time of OAuth last month. It’s a pretty good read, reporting that the lead developers of the providers were all aware of the vulnerability as it went down, but quickly and effectively worked together to resolve the problem before really going public and risk inviting attacks.

As Marshall says, if OAuth was software, a fix could be implemented and pushed out to everyone who was using it – but it’s not, it’s ‘out in the wild’ and no one party is in charge of it – it’s real victory that they all cooperated so quickly and so well to neutralise the threat.

Last year Facebook released Facebook Connect and about the same time Google released Friend Connect, they’re two very similar services that allow users to connect with information and with their friends of the respective native platforms from third-party enabled sites. The intention, as I’ve written about before, is to add a layer of social interaction to ‘non-social’ sites, to connect your information and activity on these third-party sites to your information and activity (and contacts) on the original platforms.

Then in March, Yahoo! announced their service sign-on, called Yahoo! Updates.

Now, this week, Twitter have announced their connection service, called ‘Sign in with Twitter‘. It too gives you a secure authenticated access to your information and contacts, in exactly the same way the others do – except this time, it’s Twitter.

Sign in with Twitter

You might ask if we have three, do we need a fourth? Have you ever used any of the other three?

But don’t dismiss it, or think it Twitter are jumping on to any kind of bandwagon, Twitter’s implementation is fundamentally different to the others – and it could cause quite a stir.

The problem with the other services (ultimately the problem with the platforms) is, more than often not, they are completely closed and non-portable. Although you can sign-in to a third-party site and access your data, there’s a lot of limitation to what you can retrieve and publish. These popular social networks have grown and amassed huge amounts of members and data which they horde and keep to themselves. I’m not talking about privacy, I’m referring to data portability.

The infrastructures are like locked-in silos of information and each built differently, because, either, they never considered that you’d want to make your data portable or they didn’t then want (or see value) in you moving your data anywhere else. The services they’ve created to ‘connect’ to your data are also proprietary methods – custom built to channel in and out of those silos. Each of those services too, are singularities, they won’t work with each other.

Twitter though, have come up with a solution that adheres to agreed upon standards, specifically, by using OAuth to facilitate it’s connection. Technically, it’s significantly different, but in practice, you can expect it to do everything the others can do.

The community’s thoughts

Yahoo’s Eran Hammer-Lahav (a frequent contributor to OAuth) has written a good post discussing his thoughts, he says it’s ‘Open done right’ – no proprietary ‘special sauce’ clouds interoperability as happens with Facebook Connect. I think he’s right.

He looks at what happened when Facebook Connect was introduced, that they essentially offered third-party sites two key features: the ability to use existing Facebook accounts for their own needs, and access Facebook social data to enhance the site. The value of Facebook Connect is to save sites the need to build their own social layer. Twitter though, is not about yet another layer, but doing more with that you’ve already got.

Marshall Kirkpatrick also wrote about the announcement, his metaphor for the other ‘connection’ services best describes how they function – ‘it’s letting sites borrow the data – not setting data free’.

But then he talks about Twitter ‘as a platform’, and I think this is where things get interesting. He says:

Twitter is a fundamentally different beast.

All social networking services these days want to be “a platform” – but it’s really true for Twitter. From desktop apps to social connection analysis programs, to services that will Twitter through your account when a baby monitoring garment feels a kick in utero – there’s countless technologies being built on top of Twitter.”

He’s right. Twitter apps do pretty much anything and everything you can think of on top of Twitter, not just the primary use of sending and receiving tweets. I love all the OAuth and open standards adoption – but that’s because I’m a developer, but thinking about Twitter as a platform makes me wonder what kind of effect this will have on the users, how it could effect the climate, even landcape, of social media if, already being great, Twitter is given some real power

People have long questioned Twitter’s future – it’s business model, how it can be monetised, those things are important – but where can it otherwise go and how can it expand? Does it need to ‘expand’? It’s service is great it doesn’t need to start spouting needless extras and I don’t think it will. But in widening it’s connectivity, it’s adaptability, I think could change our perception of Twitter – it’s longevity and road map, the way we use it and think of ourselves using it.

My Thoughts

Irrelevant of Richard Madeley or Oprah Winfrey’s evangelism, Twitter is an undeniable success.

When Facebook reworked and redesigned their feed and messaging model, I almost couldn’t believe it. What was the ‘status’ updates, basically IS Twitter now, and that’s it’s backbone. It’s Twitter’s messaging model, it asks ‘What’s on your mind?’.

I’m probably not the only one who thought this, I’d guess any complaints about this being a bit of a blatant rip-off were bogged down by all the negativity about the interface redesign.

I think Facebook realised that Twitter has become a real rival. I think (and I guess Facebook also think) that as people become more web-savvy and literate to these sociable websites, they want to cleanse.

The great appeal of Twitter for me was, ingeniously, they took a tiny part of Facebook (this is how I saw it two years ago anyway) and made it their complete function – simple, short updates. Snippets of personal insight or creative wisdom, it didn’t matter really, what was important was it ignored the fuss and noise of whatever else Facebook had flying around it’s own ecology (and this was before Facebook applications came around) and took a bold single straight route through the middle of it.

Looking back, a lot of Facebook’s early adoption could be attributed to people growing restless with the noise and fuss of MySpace at the time – Facebook then was a clean and more structured an option.

I remember Twitter was almost ridiculed for basing it’s whole premise on such a minute part of Facebook’s huge machine. Now look at the turnaround.

Now people are growing up out of Web 2.0 craze. A lot went on, there was a lot of ‘buzz’, but a lot of progress was made in connecting things. People now are far more connected, but perhaps they’re over-connected, struggling from what Joseph Smarr calls ‘social media fatigue’. People they have multiple accounts in a ton of dispersed and unconnected sites around the web – true, each unique and successful for it’s own achievements – but it can’t go on.

Twitter for me is streamlined, cleansed, publishing. Whether talking about what I’m doing or finding out information from people or about topics that I follow, the 140 character limit constrains these utterances to be concise and straight-to-the-point pieces of information. The ‘@’ replies and hashtags are brilliant mechanisms conceived to create connections between people and objects where there is almost no space to do so.

I use my blog to write longer discourse, I use my Twitter to link to it. Likewise with the music I listen to, I can tweet Spotify URIs. I link to Last.fm events and anything particularly good I’ve found (and probably bookmarked with Delicious) I’ll tweet that out too.

Twitter for me is like a central nervous system for my online activities. I won’t say ‘backbone’ – because it’s not that heavy. Specifically a nervous system in the way it intricately connects my online life, spindling and extending out links, almost to itself be like a lifestream in micro.

Recently, I saw Dave Winer‘s ‘Continuous Bootstrap‘ which although is admittedly a bit of fun, describes the succession of platforms deemed social media ‘leaders’ (see the full post here).

What I initially noticed is that he aligns successful platforms – blogging, podcasting – with a single application: Twitter. It doesn’t matter whether he is actually suggesting that Twitter alone is as successful as any single publishing form, but it did make me wonder if Twitter, rather than being the current ‘holder of the baton’, will actually be the spawn for whatever kind of Web-wide platform does become popular next.

If the real Data Portability revolution is going to kick in, if it’s on the cusp of starting right now and everything will truly become networked and connected – would you rather it was your Twitter connections and voice that formed that basis for you or your Facebook profile?

I know I’d much rather read explore the connections I’ve made through Twitter. The kind of information I’d get back from the type of people who’d connect in this way would be far more relevant from my pool of Twitter connections rather than the old school friends and family members (notoriously) who’ve added me on Facebook, the kind that just add you for the sake of it.

If Web 3.0 (or whatever you want to call it) is coming soon, I’d rather detox. Twitter is slimmer and still feels fresh to start it out with. For me, Facebook feels far too heavy now, out of date and messy. Maybe I’m being unfair and I feel that way because I’ve fallen out of touch with it and now I visit less frequently, but all the negativity hasn’t done it any favours – and those complaints aren’t unfounded.

Tell her to push over and move them big feet.