Category Archives: Semanticweb

Inspired by Mark Birbeck’s talk on RDFa and the Semantic Web earlier this month I decided to take some of my own advice and add RDFa to my site. I’ve now created a FOAF profile here on my blog.

Reading through Mark’s articles at webBackplane, I noticed he has a very simple tutorial on how to create a basic FOAF profile. Being so straightforward, as RDFa is meant to be, and seeing how I wrote about FOAF in my dissertation almost three years ago now, as well as now having full control of my blog (now that it’s a WordPress.org installation rather than the free option) – I’ve no excuse.

In my last post I discussed RDF vocabularies, sets of agreed-upon and unambiguous terms that allow developers to structure otherwise ‘mundane’ structureless content by inserting definitions and making references to which machines and applications can follow to infer meaning and understand topics of any kind. There I gave an example that pointed to a specification used to structure book reviews.

FOAF is the Friend of a Friend vocabulary, an RDF and OWL ontology, used in the same way, but specificially to describe people, their activities, interests and their relationships to other people.

Created around mid-2000, it is now maintained by The FOAF Project, which is considered to be one of the earliest – if not the first – Semantic Web application.

The idea is that given that anyone can put RDF(a) on their Web site, equally, anyone can describe themselves using FOAF. By doing this they are creating themselves a FOAF profile, thus have joined the linked network of people who have already done the same. These people can then begin to create links to each others FOAF profiles and start to create a social network of friends, without the need for a centralised database or website to do it for them.

Then here’s where RDFa steps in, which now allows developers to implement structured data straight into their HTML mark-up, wrapping existing or new content with an extended set of attributes, meaning they no longer have to host a separate RDF file and rely on either applications indexing that file or creating a link to it from another page to point to it.

Creating a FOAF profile

I already have an ‘About’ page on this blog – a bit of blurb about who I am and what I do. So it’s here that I’m implementing my FOAF information.

As said, there’s no need to link to a separate RDF file if you use RDFa, so really you can add the metadata anywhere, in your headers or footers for example, but that About page is the most relevant place for me and already contains the information and links I want to share anyway.

Firstly I wrap the text in a div tag that defines the FOAF namespace and declares that this div is a ‘Person object’, that the contents of this div describes a person. This is done by referring to the foaf:Person type of the FOAF vocabulary:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is Marc Hibbins.</p>

</div>

I also use the about attribute with the value #me which is a useful convention to easily enable people to create links to me, more on this later.

The FOAF Person object contains a lot of properties to describe you personally, what kinds of activities you are involved in and terms that create connections to sites or documents relating to you.

Now that my object is created I can start annotating the text with some of these terms, for example my name:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>
<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>

</div>

And then some links, FOAF has terms for to define your blog and homepage URLs:

<a rel=”foaf:weblog” href=”http://blog.marchibbins.com/”>My blog</a>
<a rel=”foaf:homepage” href=”http://www.marchibbins.com”>My site</a>

It’s also common to have an image, so likewise if I had I would attach the foaf:img term. The full A-Z index of terms can be found on the specification: http://xmlns.com/foaf/spec/.

FOAF allows you to connect to other online accounts that you own. Mark’s tutorial has the following example to attach his Twitter account to his person object:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
<span property=”foaf:accountName” >markbirbeck</span>
</span>
</span>

The foaf:holdsAccount definition creates a relationship between the typeof=”foaf:Person” object and the typeof=”foaf:OnlineAccount” object that follows (the above mark-up would be contained within said Person object). Note the foaf:holdsAccount span allows for multiple foaf:OnlineAccount objects inside. The foaf:accountServiceHomepage term defines the service homepage, Twitter’s home page in this case and the foaf:accountName property declares Mark’s username.

As you’ll notice (as he does, too) although its machine-readable it isn’t particularly human-readable. Well it is, but it’s not all that nice. So instead he uses this formatting:

His inane comments are available on his
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”>
<a rel=”foaf:accountServiceHomepage”
href=”http://twitter.com/”>Twitter</a>
account. His ID is ‘
<span property=”foaf:accountName”>markbirbeck</span>
‘.
</span>
</span>

This is better for human reading, though I still think a little convoluted. All I want is a single link, just like I already have. As I’ve said, RDFa should have no effect on my content – its workings should be hidden to the reader.

So in my mark-up, rather than the rel attribute on a tags and using in-line values (values found immediately between the a tags), I use the property and content attributes on spans:

<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName” content=”marchibbins”>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web…
</span>
</span>
</span>

This allows me to keep my existing prose and is still machine-accessible.

I mentioned being able to connect to other people’s FOAF profiles, this is done by attaching the foaf:knows term to a link to someone’s similar such page:

<a rel=”foaf:knows” href=”http://www.w3.org/People/Ivan/#me“>Ivan Herman</a>

Note here that Ivan Herman has employed the #me mechanism in his FOAF URI to connect directly to his profile information, rather than to the whole page which contains that information.

I’ve decided not to connect to friends or colleagues here in this way, again it wasn’t in my original content and also I use a similar technology instead, called XFN, in the footer of my blog pages. XFN deserves a blog post to itself (that hopefully I’ll get time for), have a look a the source and you’ll see similar rel attributes there for now.

My FOAF profile

So here it is, abridged but with all the RDFa shown:

<div xmlns:foaf=”http://xmlns.com/foaf/0.1/” about=”#me”
typeof=”foaf:Person”
>

<p>Hello.</p>
<p>My name is <span property=”foaf:name”>Marc Hibbins</span>.</p>
<p>I’m an interactive and digital media developer, I build Web applications and
RIAs primarily with tools like Flash, Flex and AIR.</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://delicious.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://delicious.com/marchibbins”>View my Delicious
bookmarks</a> – most things are about the Semantic Web,
gathered as dissertation research…
</span>
</span>
</span>
</p>

<p>
<span rel=”foaf:holdsAccount”>
<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://friendfeed.com/”
>
<span property=”foaf:accountName” content=”hibbins”>
I use <a href=”http://friendfeed.com/hibbins”>FriendFeed</a>
</span>
</span>,

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://twitter.com/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://twitter.com/marchibbins”>Twitter</a>
</span>
</span> and

<span typeof=”foaf:OnlineAccount”
property=”foaf:accountServiceHomepage”
content=”http://last.fm/”
>
<span property=”foaf:accountName”
content=”marchibbins”
>
<a href=”http://www.last.fm/user/marchibbins”>Last.fm</a>
</span>
</span> etc etc..
</span>
</p>

<p>
<a rel=”foaf:homepage”
href=”http://www.marchibbins.com”>www.marchibbins.com</a>
</p>
</div>

Notice that I’m actually using two foaf:holdsAccount blocks – you can, of course, contain all foaf:OnlineAccount objects within a single relation, but it seems that WordPress won’t allow me to do so. When I hit return to start a new paragraph it automatically closes the wrapping span and p and starts a new paragraph – so I’ve had to use two. Otherwise the p tags would be inside the span (rather than the other way round) but again, the MCE doesn’t show p tags in order for me to edit them in that way.

Similarly, WordPress will wipe clean all your span tags if you switch from HTML to Visual mode – so watch out for that. It also doesn’t output nice, clean indented HTML in the final page, which is a shame.

Find the full version here.

Validating RDFa

How do you know that any of your metadata is correct – that it is machine-readable?

I took Mark Birbeck’s recommendation and used the Ubiquity parser bookmarklet to validate my RDFa. Simply publish your mark-up and hit the button and you’ll see what an RDFa parser sees.

Hopefully it is in fact all correct. I wasn’t too sure if multiple foaf:holdsAccount blocks would be acceptable, but the Ubiquity parser shows the same results nethertheless – likewise with my use of property and content spans over rel attributes. That said, if anyone has opinions otherwise – let me know!

In the latest Nodalities podcast, Paul Miller talks to Dame Wendy Hall, Professor of Computer Science at the University of Southampton and a founding Director of the Web Science Research Initiative.

The Web Science Research Initiative (WSRI) is a joint venture between MIT and the University of Southampton to teach the literal academic ‘science’ of the web.

Founded in 2006 alongside Sir Tim Berners-Lee, Professor Nigel Shadbolt and Daniel J. Weitznerby, Dame Wendy talks with Paul about some of the thinking she and Sir Tim shared that eventually resulted in the conception of the project.

They recognised there are many determining factors outside of pure technology that shape the evolution of the Web. That as a human construct, there is a need for new ways of thinking about the Web, that we need to understand as much about how and what effects humans have on its evolution as much as how the Web effects our society.

The Web is one of the most (if not the most) transformative applications in the history of computing and communications. It’s changed how we teach, publish, govern and do business, studied in anthropology, sociology, psychology, economics – needless to say a lengthy list – and the Web Science is to consider the Web in all these fields (and more) not only as a computer science.

It’s also to intended to anticipate future developments, forsee the good and bad consequences of its change.

They’ve been working with the Web for a long time – since the earliest days of hypertext and hypermedia and with such experience have recognised the cyclical nature of Web trends, that every five years or so sees great advances in the Web’s evolution. Think Web 2.0 for the latest phase – the next (apparently) being Web 3.0 (or the ‘Data Web’ or the ‘Web of Linked Data’) or the Semantic Web – whatever buzzword you want to ply it with. The WSRI, in part, stands to find out what’s likely to come, to inform us and our decisions.

Of course, it was also in part founded to evangelise the Semantic Web. The Semantic Web was and is still Berners-Lee’s original vision for the Web that he had back as early as WWW94 (though ‘unnamed’). These small phases add up to the larger realisation of this original dream – and with that, Dame Wendy discusses her thoughts on how this will continue in its future. She talks about the WSRI’s efforts to create a wide network of key Web Science labs across the globe and their work with curriculum developers and government agencies, also of their training of university teachers and educators to inject Web science into higher education as recognised academia.

Paul Miller also shares some thoughts on his ZDNet blog - at first he was sceptical, suggesting that we really don’t need yet another academic subject just to ‘permit’ us to study the Web, that we’re perfectly well served by enough areas of study (those listed above) that already seek to understand both the Web and its impact upon all of us. But he too, can’t deny that Web Science as a ‘label’ can be beneficial to the Semantic cause in both the evangelistic sense but also by providing ‘institutional credibility’ to their area of research.

I collected a number of Web Science and the WSRI related bookmarks during my thesis research, for further reading:

http://delicious.com/marchibbins/wsri
http://delicious.com/marchibbins/webscience

Not to be outdone by Google’s efforts this week, Ask.com have also expanded their search technology to return specific ‘direct’ answers to searches, where possible, by means of semantic language processing.

Fortunately far more public than Google, Ask.com announced on their blog yesterday that they’ve been developing proprietry semantic analysis technology since October of last year in their efforts of advancing the next generation of search tools.

DADS(SM) (Direct Answers from Databases), DAFS(SM) (Direct Answers from Search), and AnswerFarm(SM) technologies, which are breaking new ground in the areas of semantic, web text, and answer farm search technologies. Specifically, the increasing availability of structured data in the form of databases and XML feeds has fueled advances in our proprietary DADS technology. With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time. Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query.

The results aren’t returned as explicitly as Google’s, mainly due to the amount of adverts above the page fold, but they work. Try searching for ‘Football on TV this weekend‘ or ‘Movies on TV now‘ and you’ll see the results in accordingly custom-formatted sections.

Unfortunately the results are still only returned in HTML, so again – the term ‘semantics’ here describes the form of processing Ask.com are doing beind the scenes rather than depicting this as their first outright foray in to the Semantic Web (capital S).

This though is proprietary technology and presumably it’ll stay that way. So I’m unsure whether to celebrate their realisation of the importance of semantics (in search at least) or in realising their more ‘closed source’ ethos, consider this to be almost against the idea of the Semantic Web – portability, sharing, transparency – as they hold these advances close to their chest in order to gain an edge over their competitors, causing others in the future to understandably do too.

Quite out of the blue and without notification of it’s launch as far as I’ve been able to find, Google seem to be exposing semantic data in their global search results.

Try searching for ‘What is the capital city of England?’ or ‘Who is Bill Clinton’s wife?’ and you’ll see sourced direct answers returned at the top of your search results.

It’s hard to tell if these direct results are actually semantic expressions or just presented to appear that way – in the expected Semantic triple of subject-predicate-object. The list of sources definitely don’t structure their information with semantic expression, so perhaps quite an amount of logic and natural language processing is being done on Google’s part to process non- or semi-structured data.

I’ve tried to find out before what Google have been up to concerning semantic technology but found little. The coverage over at ReadWriteWeb reports that neither they or their Semantic Web contacts had heard or seen anything about this before, but the community feedback suggests there’s been variations of this for some time – including a three year old Google program called ‘Direct Answers’ – but none of the coverage of that program offers the kind of examples we’re seeing here.

Marshall Kirkpatrick points to a blog post of Matt Cutts, Google search algorithm engineer, but it seems to be a dead link now. Though trailing through Google’s caches, it seems to find him quote:

Many of the data points are being pulled in from the structured part of Wikipedia entries, which is interesting. Other sources are wide ranging, from a license plate website to Jason Calacanis’s Mahalo.

If Google are constructing semantic data from semi-structured or non-structured source data, then there’s undoubtedly some quite powerful semantic processing technology in place. I highly doubt this will be the final product of their development with such technologies, simply the first we’ve noticed – most likely why it’s slipped under most people’s notice.

The inaccuracy is also an issue. Try searching ‘Who is Bob Dylan’s wife?’ – and you’ll see Sara Lownds (his ex-wife) returned. Seeing these direct answers reminds me of True Knowledge.

Even their example questions though, are far more complex – for example, ‘Who owns Universal Studios?‘, ‘Is the Chrysler building taller than the Eiffel Tower?‘, ‘What is the half life of plutonium 239?‘.

More importantly, if it doesn’t know that answer, it won’t ‘guess’ – it’ll tell you it doesn’t know and ask you to deconstruct your query in order to expand it’s knowledge base so it can find the answer later.

As Marshall says, this is all speculation based on limited observation – and low visibility of Google’s development. Hopefully there’ll be more soon!

For some time I’ve been meaning to write about Facebook Connect and Google Friend Connect, two potentially huge social web developments that have been gathering speed and popularity over the past few weeks.

Both services are very similar. Essentially, each functions to simplify the connection between social and non-social websites by offering connectivity (and some functionality) of each’s proprietary central platform on 3rd party websites.

The idea is that a user can ‘Connect’ with whichever service the site has employed and find users with whom they’ve already connected with on the other services – rather than creating a new account, profile, repeat the steps of entering information to then find the friends you’ve already added over and over again with every other social-enabled web app you’ve used previously.

I first saw Facebook Connect in August with their demonstration service  The Run Around. There, you could ‘Connect with Facebook’ to initially join the site and immediately see who else (of your Facebook friends) has joined too. This is all outside of the Facebook chrome, not on the Facebook domain. What’s more, as well as interacting with the linked data pulled from Facebook, the website could push data back in. The actual site intended to track your running routes and times, so when you submitted a new ‘run’, it would publish to your live newsfeed on your Facebook profile.

The idea is simple, the effect could be game-changing. It’s been met with both cautious optimism and healthy skepticism.

If this becomes as massive as it could be, we could see a single sign-in that abolishes the need to register and re-register for every newly launched social app. We’re already experiencing social fatigue within that process as consumers and as developers, we’re having to build whole registration and authentication systems from scratch every time. Plugging into a platform like this – that we assume to be secure and trusted – could offer a means to develop and deploy services much easier and faster.

But can we trust – or do we want to trust – a propriety platform to do this for us? The idea of a single social graph isn’t new, but I don’t know if I want Facebook to offer it. I’d much prefer FOAF :) – but how many people outside of the development world have heard of it?

I feel I need to write another post entirely about OpenID, OpenSocial and OAuth entirely – services that can’t go unmentioned here – but Marshall Kirkpatrick at ReadWriteWeb wrote a direct comparison of Facebook Connect and OpenID that asks some interesting questions as well as offering a good introduction to the open source services anyway. Although he started by discussing as to which of the two should website owners use to authenticate and learn about their users, the community expanded his initial mindmap to cover pretty much every angle in the comparison – and it’s very detailed, see it here.

He also asks, even if it doesn’t become the dominant identifier online, will Facebook’s challenge breathe new life into the movement for open source, standards based, federated user identity?

Then there’s Google Friend Connect – launched in public beta the same day as Facebook Connect went public for 3rd party sites. This does use a blend of the open source services, but although integrating the open standards might suggest a weightier development process, the first thing to notice is a far less developer-oriented implementation than Facebook Connect.

Using Facebook Connect is down to the site creator to construct and integrate an interface to facilitate the connection – Google Friend Connect is widgety, with pretty much zero coding other than cutting and pasting directed portions. Similarly with the functionality, Google offer widgets for simple commenting on pages, media sharing, or rating content. With Facebook Connect you have to write that yourself – although admittedly, you then have full reign on design and interaction.

There’s a demonstration video on the Google blog’s announcement of the beta launch.

It’s not like this is just a two-horse race though, or that someone won’t work out a way two use both anyway. Google and Facebook are in direct competition, but attempting to open the Web in this way extends far beyond them.

What I find interesting is the interoperability. These technologies aren’t semantic, but do push the exposure and interoperation on a user’s social graph with ideas akin to the Semantic Web – utilising data to extend single-site online identities and networking social connections.

They’re not Semantic Web efforts but they have similar aims. Friend Connect’s goal is an open social web, the Semantic Web is – quite simply ;) – a fully understood, completely open web, not only it’s social domain.

I know a pretty little place in Southern California down San Diego way.