Monthly Archives: July 2009

Inspired by Mark Birbeck’s talk on RDFa and the Semantic Web earlier this month I decided to take some of my own advice and add RDFa to my site. I’ve now created a FOAF profile here on my blog.

Reading through Mark’s articles at webBackplane, I noticed he has a very simple tutorial on how to create a basic FOAF profile. Being so straightforward, as RDFa is meant to be, and seeing how I wrote about FOAF in my dissertation almost three years ago now, as well as now having full control of my blog (now that it’s a WordPress.org installation rather than the free option) – I’ve no excuse.

In my last post I discussed RDF vocabularies, sets of agreed-upon and unambiguous terms that allow developers to structure otherwise ‘mundane’ structureless content by inserting definitions and making references to which machines and applications can follow to infer meaning and understand topics of any kind. There I gave an example that pointed to a specification used to structure book reviews.

FOAF is the Friend of a Friend vocabulary, an RDF and OWL ontology, used in the same way, but specificially to describe people, their activities, interests and their relationships to other people.

Created around mid-2000, it is now maintained by The FOAF Project, which is considered to be one of the earliest – if not the first – Semantic Web application.

The idea is that given that anyone can put RDF(a) on their Web site, equally, anyone can describe themselves using FOAF. By doing this they are creating themselves a FOAF profile, thus have joined the linked network of people who have already done the same. These people can then begin to create links to each others FOAF profiles and start to create a social network of friends, without the need for a centralised database or website to do it for them.

Then here’s where RDFa steps in, which now allows developers to implement structured data straight into their HTML mark-up, wrapping existing or new content with an extended set of attributes, meaning they no longer have to host a separate RDF file and rely on either applications indexing that file or creating a link to it from another page to point to it.

Creating a FOAF profile

I already have an ‘About’ page on this blog – a bit of blurb about who I am and what I do. So it’s here that I’m implementing my FOAF information.

As said, there’s no need to link to a separate RDF file if you use RDFa, so really you can add the metadata anywhere, in your headers or footers for example, but that About page is the most relevant place for me and already contains the information and links I want to share anyway.

Firstly I wrap the text in a div tag that defines the FOAF namespace and declares that this div is a ‘Person object’, that the contents of this div describes a person. This is done by referring to the foaf:Person type of the FOAF vocabulary:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is Marc Hibbins.</p>

</div>

I also use the about attribute with the value #me which is a useful convention to easily enable people to create links to me, more on this later.

The FOAF Person object contains a lot of properties to describe you personally, what kinds of activities you are involved in and terms that create connections to sites or documents relating to you.

Now that my object is created I can start annotating the text with some of these terms, for example my name:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>

</div>

And then some links, FOAF has terms for to define your blog and homepage URLs:

<a rel=”foaf:weblog” href=”http://blog.marchibbins.com/”>My blog</a>
<a rel=”foaf:homepage” href=”http://www.marchibbins.com”>My site</a>

It’s also common to have an image, so likewise if I had I would attach the foaf:img term. The full A-Z index of terms can be found on the specification: http://xmlns.com/foaf/spec/.

FOAF allows you to connect to other online accounts that you own. Mark’s tutorial has the following example to attach his Twitter account to his person object:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
<span property=”foaf:accountName” >markbirbeck</span>
</span>
</span>

The foaf:holdsAccount definition creates a relationship between the typeof=”foaf:Person” object and the typeof=”foaf:OnlineAccount” object that follows (the above mark-up would be contained within said Person object). Note the foaf:holdsAccount span allows for multiple foaf:OnlineAccount objects inside. The foaf:accountServiceHomepage term defines the service homepage, Twitter’s home page in this case and the foaf:accountName property declares Mark’s username.

As you’ll notice (as he does, too) although its machine-readable it isn’t particularly human-readable. Well it is, but it’s not all that nice. So instead he uses this formatting:

His inane comments are available on his
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
account. His ID is ‘
<span property=”foaf:accountName”>markbirbeck</span>
‘.
</span>
</span>

This is better for human reading, though I still think a little convoluted. All I want is a single link, just like I already have. As I’ve said, RDFa should have no effect on my content – its workings should be hidden to the reader.

So in my mark-up, rather than the rel attribute on a tags and using in-line values (values found immediately between the a tags), I use the property and content attributes on spans:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName” content=”marchibbins”>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web…
</span>
</span>
</span>

This allows me to keep my existing prose and is still machine-accessible.

I mentioned being able to connect to other people’s FOAF profiles, this is done by attaching the foaf:knows term to a link to someone’s similar such page:

<a rel=”foaf:knows” href=”http://www.w3.org/People/Ivan/#me“>Ivan Herman</a>

Note here that Ivan Herman has employed the #me mechanism in his FOAF URI to connect directly to his profile information, rather than to the whole page which contains that information.

I’ve decided not to connect to friends or colleagues here in this way, again it wasn’t in my original content and also I use a similar technology instead, called XFN, in the footer of my blog pages. XFN deserves a blog post to itself (that hopefully I’ll get time for), have a look a the source and you’ll see similar rel attributes there for now.

My FOAF profile

So here it is, abridged but with all the RDFa shown:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>

<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>
<p>I’m an interactive and digital media developer, I build Web applications and
RIAs primarily with tools like Flash, Flex and AIR.</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web,
gathered as dissertation research…
</span>
</span>
</span>
</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://friendfeed.com/”
>
<span property=”foaf:accountName” content=”hibbins”>
I use <a href=”http://friendfeed.com/hibbins”>FriendFeed</a>
</span>
</span>,

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://twitter.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://twitter.com/marchibbins”>Twitter</a>
</span>
</span> and

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://last.fm/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://www.last.fm/user/marchibbins”>Last.fm</a>
</span>
</span> etc etc..
</span>
</p>

<p>
<a rel=”foaf:homepage”
href=”http://www.marchibbins.com”>www.marchibbins.com</a>
</p>
</div>

Notice that I’m actually using two foaf:holdsAccount blocks – you can, of course, contain all foaf:OnlineAccount objects within a single relation, but it seems that WordPress won’t allow me to do so. When I hit return to start a new paragraph it automatically closes the wrapping span and p and starts a new paragraph – so I’ve had to use two. Otherwise the p tags would be inside the span (rather than the other way round) but again, the MCE doesn’t show p tags in order for me to edit them in that way.

Similarly, WordPress will wipe clean all your span tags if you switch from HTML to Visual mode – so watch out for that. It also doesn’t output nice, clean indented HTML in the final page, which is a shame.

Find the full version here.

Validating RDFa

How do you know that any of your metadata is correct – that it is machine-readable?

I took Mark Birbeck’s recommendation and used the Ubiquity parser bookmarklet to validate my RDFa. Simply publish your mark-up and hit the button and you’ll see what an RDFa parser sees.

Hopefully it is in fact all correct. I wasn’t too sure if multiple foaf:holdsAccount blocks would be acceptable, but the Ubiquity parser shows the same results nethertheless – likewise with my use of property and content spans over rel attributes. That said, if anyone has opinions otherwise – let me know!

On Monday I attended Mark Birbeck‘s seminar The Possibilities of RDFa and the Semantic Web, an ‘in-the-brain’ session on RDFa and structured data discussing some of the exciting possibilities created by authors implementing structured data via RDFa into Web pages.

We looked at some existing implementations, how to enhance very simple Web pages to be machine-’readable’ and looked at RDFa as part of the bigger picture – in correspondence to RDF, Microformats and other technologies, and the Semantic Web.

RDFa (Resource Description Framework in attributes) is a set of extensions to XHTML, a set of attributes intended to be implemented by Web page publishers to introduce extra ‘structure’ to the information in documents. These attributes are added at author-time and are completely invisible to the end-user. RDFa is essentially metadata, additional machine-readable indicators that describe the information they annotate without intruding on existing mark-up.

The W3C’s RDFa Primer is a good introduction for those unfamiliar with the technology and it conveys the current problem of the separation between the information on the Web readable by humans and that ‘readable’ by machine – the data Web.

Today’s Web is built predominantly for human consumption. Even as machine-readable data begins to appear on the Web it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, Web browsers can provide only minimal assistance to humans in parsing and processing data: browsers only see presentation information.

On a typical Web page, an XHTML author might specify a headline, then a smaller sub-headline, a block of italicised text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands that the headline is, in fact, the blog post title, the sub-headline indicates the author, the italicised text is the article’s publication date, and the single-word links are categorisation labels.

Presentation vs Semantics

The gap between what programs and humans understand is large. But when Web data meant for humans is augmented with machine-readable hints meant for the computer programs that look for them, these programs become significantly more helpful because they can begin to understand the data’s structure and meaning.

RDFa is getting a lot of attention lately, not least with developments of HTML5 and the consideration of its implementation with the specification. As of October 2008, RDFa in XHTML is a W3C Recommendation, but it was Mark Birkbeck who was first to propose RDFa in a note to the W3C in February 2004 – so I was looking forward to this.

Publishing information and publishing data

Mark started with a modern use-case for RDFa, showing us how we can and why we should put structured data in our sites.

He pointed out that online publishing has never been easier. There are a huge number of blogging services offering free, easy to set-up and maintain platforms that are long established and now with sites like Twitter offering the ability to update via SMS or from a mobile device, the Web is enabling a level of ubiquitous publishing that we’re now completely familiar with.

This is us publishing information, human-readable content. But he says that publishing data, however, still remains hard.

By data, he refers to information ‘without prose’ – think listings on eBay or Yelp, where numbers and stats are what matter over articles of text. He suggests that places like eBay and Yelp are the only places where publishing data is easy. But by using these sites, especially those where the majority of content is user-generated, raises questions as to who actually owns the data.

As a result, people have begun to make their own sites, for maintain their own data. That sounds easy enough, but the problem is that the most accessible means to make websites nowadays is to use the blogging platforms that I mentioned earlier – those that aren’t really meant for publishing data, they’re meant for publishing information.

Mark showed some examples where this is happening, WordPress and Blogger type sites set up by individuals for reviewing films, or books, or restaurants etc – where each blog ‘post’ was an entry for an individual film or book, and details like scores and ratings (the valuable data) is lumped together in the article body – because that’s all it can really do – it’s meant for prose.

Of course, that’s fine, for their own sake. They have their content on the Web, on a domain they control. But often this content is of real value, its useful for more than just the author and other people should be see it.

Often though without being on sites such as eBay or Yelp, users will infrequently find them. This is because the norm for finding content is by way of a search engine, i.e. using a machine to find human information. True, some independent sites do rank highly in search engines, but for any newly launched site it is near impossible.

So enter RDFa. By using RDFa, authors can package their existing content in additional mark-up so machines can discriminate between what parts of a long body of text refers specifically to valuable discreet details. Authors can target and isolate individual pieces of text and associate these with a predefined vocabulary of terms specifically for the relevant context of the subject data.

An example

The following is Mark’s example blog post, some typical mark-up for a book review:

<div>
<span>
Title: <span>Chanda’s Secrets</span>
</span>

Stars: <span>*****</span>

<span>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

This is very simple mark-up, perfectly understandable for human consumption, but offers machines agents zero indication as to what the content refers to.

RDF is Resource Description Framework, it’s purpose is to describe. To do so, it references vocabularies of terms – definitions that provide agreed-upon tags for developers to use that indicate unambiguous details. Machine applications finding this data can then interpret these as they choose, matching the unique identifying terms to the same vocabulary in their process as reference to exactly what was intended, to infer meaning.

There is a vocabulary for book reviews, for example. The following code takes the mark-up above and adds RDFa, enriching the data for the machines ‘viewpoint’, while having no impact on its presentation – the human viewpoint:

<div xmlns:v=”http://rdf.data-vocabulary.org/#” typeof=”v:Review”>
<span rel=”v:itemreviewed”>
Title: <span property=”v:name”>Chanda’s Secrets</span>
</span>

Stars: <span property=”v:rating” content=”5″>*****</span>

<span property=”v:summary”>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

A little run through of the code – the first line adds a reference to the RDFa vocabulary to use, defining a XML namespace, v, and declares that everything within this div is a review (type v:Review). We add v:itemreviewed to the title and specify its v:name explicitly. This review class has a rating property, so in the same way we add the v:rating property to the five stars. Here you can see another example of a lack of machine/human understanding – we can see five asterisks (they aren’t even five stars) and work out what they refer to – but a machine has no idea. So we ‘overwrite’ the inline text with the content attribute, declaring outright that our rating has a value of 5. The summary is tagged in the same way then.

Now it’s not like this means the post would jump straight to the top of Google’s search results. RDFa isn’t about solving that problem, but it helps. It allows machines to infer meaning and act accordingly, if the application is capable of doing so.

For example, any search engine could probably find this post as by matching the search term ‘Chanda’s Secrets’ in a standard lookup query. But recently Google added RDFa support and Yahoo! have been doing it for over a year, and both sites now have specific applications for parsing RDFa on top of their normal workings.

Google’s is called Rich Snippets, which returns this kind of result:

A Google 'Rich Snippet'

It’s an enhanced search result that can feature context-relevant details alongside the link, in this case a star-rating and price range indicator. It’s with this kind of application you can begin to see how adding structured data can make powerful improvements to your site.

Writing a review and having everything lumped together in the body of an article is fine for humans reading the content, they can make sense of the words, numbers and pictures themselves, but a machine can’t. By annotating your review – indicating, for example, a restaurant rating or the price of your food – you make your data machine-processable, so also able to be aggregated with others. This is called data distribution. In this way, you also still ‘own’ your content because it originates from your site, but it is as computable as those larger sites and databases.

More possibilities

As well as being used as a means to structure data in a desired format, RDFa can also be used to standardise data, again for interoperability and processing.

Mark presented some job aggregation sites that demonstrate this. Each site was made with a different technology (ASP .NET, PHP, static pages – whatever) and each displayed data in a different way, not only in their presentation but also by using different terminology – a job ‘title’ or ‘name’ or ‘position’, for example. RDFa indicators standardise these ambiguities, so third-party sites, such as aggregators, can intelligently collate the data with an understanding of the differences.

He also spoke about joining the linked data cloud, showing sites that could use the unique identifiers of a vocabulary with reference to their own content for an opportunity to enrich their pages. He had a nifty example that populated a book review page by using the relevant class and retrieving the book’s cover from a external source that had tagged their data in the same way. That match ensured the discovered data would be relevant.

Finally he talked about vertical search. Where RDFa, at least at it’s most rudimentary, can be used to work with ambiguous words, synonyms and such. A search for a chemical symbol, in traditional lookup search, would not take into account it’s chemical name or any other term that refers correctly to it, albeit by a different word. With an application that converts said terms to an agreed upon identifier, for example it’s chemical formula, users could retrieve all references, by whatever name, tagged also with that identifier.

Q&A

The Q&A to close the seminar was also really useful. Here Mark talked about Microformats, in comparison to RDFa, something I was waiting for. He pointed out that Microformats each require their own parser, need references to a separate vocabulary for each microformat used and commented on the standard being centrally maintained by a handful of people. RDF is decentralised, there is no governing body – that anyone can create a vocabulary (though of course that might not always be the best answer), and really made Microformats seem clunky and second-best to RDFa – an opinion I hadn’t held before.

Mark also pointed out that RDFa is arguably more extensible in that can relate to references not on the actual page of implementation. For example on a page where a license is being specified, with RDFa and by using the about tag, users can refer to objects linking from their page – such as listing different licenses for a set of external images or videos, rather than only being able to describe the license for the page itself, the page that these images link from.

Eventually there was some more Semantic Web talk, discussing technologies such as RDF, OWL and ontologies – but not as much as I’d hoped.

In fact, a lot of the content of the talk can be found in Mark’s excellent articles written for A List Apart, his Introduction to RDFa and Introduction to RDFa, Part II.

I highly recommend reading these, he explores the different methods of annotating your pages, how to apply attributes to various kinds of elements, discusses vocabularies, rules and how to express relations – a great introduction.

Overall, a great talk. He’s also now uploaded his slides, with a video recording of the session pending.

Here they are:

The wise men were all fools, what to do?