Category Archives: Html

On Monday I attended Mark Birbeck‘s seminar The Possibilities of RDFa and the Semantic Web, an ‘in-the-brain’ session on RDFa and structured data discussing some of the exciting possibilities created by authors implementing structured data via RDFa into Web pages.

We looked at some existing implementations, how to enhance very simple Web pages to be machine-’readable’ and looked at RDFa as part of the bigger picture – in correspondence to RDF, Microformats and other technologies, and the Semantic Web.

RDFa (Resource Description Framework in attributes) is a set of extensions to XHTML, a set of attributes intended to be implemented by Web page publishers to introduce extra ‘structure’ to the information in documents. These attributes are added at author-time and are completely invisible to the end-user. RDFa is essentially metadata, additional machine-readable indicators that describe the information they annotate without intruding on existing mark-up.

The W3C’s RDFa Primer is a good introduction for those unfamiliar with the technology and it conveys the current problem of the separation between the information on the Web readable by humans and that ‘readable’ by machine – the data Web.

Today’s Web is built predominantly for human consumption. Even as machine-readable data begins to appear on the Web it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, Web browsers can provide only minimal assistance to humans in parsing and processing data: browsers only see presentation information.

On a typical Web page, an XHTML author might specify a headline, then a smaller sub-headline, a block of italicised text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands that the headline is, in fact, the blog post title, the sub-headline indicates the author, the italicised text is the article’s publication date, and the single-word links are categorisation labels.

Presentation vs Semantics

The gap between what programs and humans understand is large. But when Web data meant for humans is augmented with machine-readable hints meant for the computer programs that look for them, these programs become significantly more helpful because they can begin to understand the data’s structure and meaning.

RDFa is getting a lot of attention lately, not least with developments of HTML5 and the consideration of its implementation with the specification. As of October 2008, RDFa in XHTML is a W3C Recommendation, but it was Mark Birkbeck who was first to propose RDFa in a note to the W3C in February 2004 – so I was looking forward to this.

Publishing information and publishing data

Mark started with a modern use-case for RDFa, showing us how we can and why we should put structured data in our sites.

He pointed out that online publishing has never been easier. There are a huge number of blogging services offering free, easy to set-up and maintain platforms that are long established and now with sites like Twitter offering the ability to update via SMS or from a mobile device, the Web is enabling a level of ubiquitous publishing that we’re now completely familiar with.

This is us publishing information, human-readable content. But he says that publishing data, however, still remains hard.

By data, he refers to information ‘without prose’ – think listings on eBay or Yelp, where numbers and stats are what matter over articles of text. He suggests that places like eBay and Yelp are the only places where publishing data is easy. But by using these sites, especially those where the majority of content is user-generated, raises questions as to who actually owns the data.

As a result, people have begun to make their own sites, for maintain their own data. That sounds easy enough, but the problem is that the most accessible means to make websites nowadays is to use the blogging platforms that I mentioned earlier – those that aren’t really meant for publishing data, they’re meant for publishing information.

Mark showed some examples where this is happening, WordPress and Blogger type sites set up by individuals for reviewing films, or books, or restaurants etc – where each blog ‘post’ was an entry for an individual film or book, and details like scores and ratings (the valuable data) is lumped together in the article body – because that’s all it can really do – it’s meant for prose.

Of course, that’s fine, for their own sake. They have their content on the Web, on a domain they control. But often this content is of real value, its useful for more than just the author and other people should be see it.

Often though without being on sites such as eBay or Yelp, users will infrequently find them. This is because the norm for finding content is by way of a search engine, i.e. using a machine to find human information. True, some independent sites do rank highly in search engines, but for any newly launched site it is near impossible.

So enter RDFa. By using RDFa, authors can package their existing content in additional mark-up so machines can discriminate between what parts of a long body of text refers specifically to valuable discreet details. Authors can target and isolate individual pieces of text and associate these with a predefined vocabulary of terms specifically for the relevant context of the subject data.

An example

The following is Mark’s example blog post, some typical mark-up for a book review:

<div>
<span>
Title: <span>Chanda’s Secrets</span>
</span>

Stars: <span>*****</span>

<span>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

This is very simple mark-up, perfectly understandable for human consumption, but offers machines agents zero indication as to what the content refers to.

RDF is Resource Description Framework, it’s purpose is to describe. To do so, it references vocabularies of terms – definitions that provide agreed-upon tags for developers to use that indicate unambiguous details. Machine applications finding this data can then interpret these as they choose, matching the unique identifying terms to the same vocabulary in their process as reference to exactly what was intended, to infer meaning.

There is a vocabulary for book reviews, for example. The following code takes the mark-up above and adds RDFa, enriching the data for the machines ‘viewpoint’, while having no impact on its presentation – the human viewpoint:

<div xmlns:v=”http://rdf.data-vocabulary.org/#” typeof=”v:Review”>
<span rel=”v:itemreviewed”>
Title: <span property=”v:name”>Chanda’s Secrets</span>
</span>

Stars: <span property=”v:rating” content=”5″>*****</span>

<span property=”v:summary”>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…
</span>

</div>

A little run through of the code – the first line adds a reference to the RDFa vocabulary to use, defining a XML namespace, v, and declares that everything within this div is a review (type v:Review). We add v:itemreviewed to the title and specify its v:name explicitly. This review class has a rating property, so in the same way we add the v:rating property to the five stars. Here you can see another example of a lack of machine/human understanding – we can see five asterisks (they aren’t even five stars) and work out what they refer to – but a machine has no idea. So we ‘overwrite’ the inline text with the content attribute, declaring outright that our rating has a value of 5. The summary is tagged in the same way then.

Now it’s not like this means the post would jump straight to the top of Google’s search results. RDFa isn’t about solving that problem, but it helps. It allows machines to infer meaning and act accordingly, if the application is capable of doing so.

For example, any search engine could probably find this post as by matching the search term ‘Chanda’s Secrets’ in a standard lookup query. But recently Google added RDFa support and Yahoo! have been doing it for over a year, and both sites now have specific applications for parsing RDFa on top of their normal workings.

Google’s is called Rich Snippets, which returns this kind of result:

A Google 'Rich Snippet'

It’s an enhanced search result that can feature context-relevant details alongside the link, in this case a star-rating and price range indicator. It’s with this kind of application you can begin to see how adding structured data can make powerful improvements to your site.

Writing a review and having everything lumped together in the body of an article is fine for humans reading the content, they can make sense of the words, numbers and pictures themselves, but a machine can’t. By annotating your review – indicating, for example, a restaurant rating or the price of your food – you make your data machine-processable, so also able to be aggregated with others. This is called data distribution. In this way, you also still ‘own’ your content because it originates from your site, but it is as computable as those larger sites and databases.

More possibilities

As well as being used as a means to structure data in a desired format, RDFa can also be used to standardise data, again for interoperability and processing.

Mark presented some job aggregation sites that demonstrate this. Each site was made with a different technology (ASP .NET, PHP, static pages – whatever) and each displayed data in a different way, not only in their presentation but also by using different terminology – a job ‘title’ or ‘name’ or ‘position’, for example. RDFa indicators standardise these ambiguities, so third-party sites, such as aggregators, can intelligently collate the data with an understanding of the differences.

He also spoke about joining the linked data cloud, showing sites that could use the unique identifiers of a vocabulary with reference to their own content for an opportunity to enrich their pages. He had a nifty example that populated a book review page by using the relevant class and retrieving the book’s cover from a external source that had tagged their data in the same way. That match ensured the discovered data would be relevant.

Finally he talked about vertical search. Where RDFa, at least at it’s most rudimentary, can be used to work with ambiguous words, synonyms and such. A search for a chemical symbol, in traditional lookup search, would not take into account it’s chemical name or any other term that refers correctly to it, albeit by a different word. With an application that converts said terms to an agreed upon identifier, for example it’s chemical formula, users could retrieve all references, by whatever name, tagged also with that identifier.

Q&A

The Q&A to close the seminar was also really useful. Here Mark talked about Microformats, in comparison to RDFa, something I was waiting for. He pointed out that Microformats each require their own parser, need references to a separate vocabulary for each microformat used and commented on the standard being centrally maintained by a handful of people. RDF is decentralised, there is no governing body – that anyone can create a vocabulary (though of course that might not always be the best answer), and really made Microformats seem clunky and second-best to RDFa – an opinion I hadn’t held before.

Mark also pointed out that RDFa is arguably more extensible in that can relate to references not on the actual page of implementation. For example on a page where a license is being specified, with RDFa and by using the about tag, users can refer to objects linking from their page – such as listing different licenses for a set of external images or videos, rather than only being able to describe the license for the page itself, the page that these images link from.

Eventually there was some more Semantic Web talk, discussing technologies such as RDF, OWL and ontologies – but not as much as I’d hoped.

In fact, a lot of the content of the talk can be found in Mark’s excellent articles written for A List Apart, his Introduction to RDFa and Introduction to RDFa, Part II.

I highly recommend reading these, he explores the different methods of annotating your pages, how to apply attributes to various kinds of elements, discusses vocabularies, rules and how to express relations – a great introduction.

Overall, a great talk. He’s also now uploaded his slides, with a video recording of the session pending.

Here they are:

Today Adobe released BrowserLab, an online service and Dreamweaver plug-in that allows Web developers to test their websites on popular browsers and across multiple operating systems.

I’m loving this.

Basically, you put in a Web address, collect a browser ‘set’ of those supported (currently, Firefox 2.0 & 3.0 on both XP and OS X, IE 6 & 7 for XP and Safari 3.0 for OS X) and screenshots of actual browser renderings are generated in real time.

Adobe BrowserLab

Not only that, but there is a side-by-side ’2-up’ comparison view to see overall differences – and even better, an onion skin (and zoom!) view can be used to measure discrepancies to the pixel.

More info and an FAQ is on the Adobe Labs page.

Back in December at the Adobe MAX Sneak Peeks session, I saw a demo of ‘Meer Meer’, which has now fully evolved to become this.

I’m not sure about the Web version, but I think the Dreamweaver CS4 plug-in stores all the popular webkits and browser engines, rendering them in real-time like a highly enhanced version of the ‘design view’ that we’ve always been familiar with. My download is halfway through now.

I’ve written posts about hacking your operating system to run multiple versions of Firefox and Internet Explorer, and recommended virtual machines for cross-platform testing - all  that seems so over-complicated and completely redundant now.

Brilliant!

There’s also a lot of talk on Twitter about it, I think a lot of people share my feelings. :)

Last week I attended a YDN Tuesday, a developer talk hosted by the Yahoo! Developer Network led by Dirk Ginader, discussing Web Accessibility.

It looks as if these presentations have been running for a while now and they’ve got a good schedule lined up for the coming months. They discuss a decent section of Web development beyond the pure skills – JS, DOM, PHP, OAuth, Web services, Yahoo! technologies and by the looks of things have AJAX, Flex and Ruby on Rails in the pipeline.

They’re also free, which is great when you’re sitting down to hear Yahoo! experts talk about what they do best!

Dirk Ginader is part of the Accessibility Task Force at Yahoo! and tackled developing fully accessible Web applications at every level – covering basic markup, best practices with CSS and accessible Javascript, finishing with a discussion on WAI-ARIA, offering some of his insight gained from working with the standard.

Most people are familiar with the common three-layer development of Web sites, building a core HTML structure, styling with CSS and enhancing functionality with Javascript. In his talk though, Dirk introduced a five-layer development method and spoke about this throughout the sessions.

Dirk Ginader's 5-layer Web development

Building on top of the common three-layer method, Dirk spoke of adding levels of ‘CSS for Javascript’, i.e. adding extra CSS if Javascript is available and enhancing the interface to promote this – and a final layer of WAI-ARIA, the W3C standard for accessible rich Internet applications.

The core layers – HTML, CSS, JS

First though Dirk went into the basics, giving a good exploration of the first shared three layers – reiterating the importance of good, clean HTML, appropriate and logical tab ordering, form building and that it should, obviously, be usable without CSS and Javascript.

Again he reiterated the importance of dividing CSS and Javascript, simply, as it always should be, that CSS is for styling and Javascript is for interaction. CSS can be used achieve a lot of interactivity functionality that would otherwise be controlled by Javascript, but these are akin to hacks, says Dirk.

Another accessibility oversight is the assumption that all users have a mouse or pointing device – and as such, all design is appropriated for mouse control. If your mark-up is good and each ‘level’ of your development has been tested and is robust, your site should be completely navigable with just the Tab and Enter keys. Also, any CSS that uses the mouse-only :hover effects, should also use :focus, which includes active tabbing.

I always feel that approaching Web development in view to adhere to strict standards and to maintain accessibility always helps produce cleaner code and generally minimise errors and cross-browser inconsistencies in the long run anyway.

Dirk spoke about the usefulness of the focus() Javascript function, in bringing users’ attention to alerts, changes, altered screen states and such – especially handy for users with screen readers or screen magnifiers.

On the subject of screen readers, Dirk spoke about how they really work, how they see a Web page and handle the content – talking about loading systems and various reading modes. This was great becausue although I’ve designed for screen readers before, I’ve never seen one being used or had a go myself – and I’m sure I’m not the only one.

CSS for Javascript

The first extra level of Dirk’s five-layer development is in adding CSS when Javascript is available. This means your interface can be altered knowing that Javascript can be used.

You can use Javascript to append an additional class to page elements so that you can use CSS to target and style them. For example, the following line of code adds a class named ‘js’:

document.documentElement.className += ” js”;

You would then style with the follow CSS, where the first declaration is global and the second applied only if Javascript has been found and appended said ‘js’ class to an element:

.module {
/* Both layouts */
}
.js .module {
/* Javascript layout */
}

Enhancing a page in this way isn’t anything new, but it is very cool.

If you’ve heard of the term Progressive Enhancement, then you’ll know why. If you’ve not, you may have heard of Graceful Degradation. Both are methods for handling differences in browser rendering and capability, they’re similar but subtly different.

Graceful degradation, or ‘fault tolerance’, is the controlled handling of single or multiple errors – that when components are at fault, content is not compromised. In developing for the Web, it means focusing on building for the most advanced and capable browsers and dealing with the older, second.

Progressive enhancement turns this idea on it’s head, focusing on building a functioning core and enhancing the design, where possible, when capable.

There are a good few articles on A List Apart about that I strongly recommend bookmarking:

 

The last article, Scott Jehl discusses enhancement with both CSS and Javascript and has a similar trick of appending class names to page objects once Javascript has been detected. He talks about how to test those capabilities and offers a testing script, testUserDevice.js, which runs a number of tests and returns a ‘score’ for your interpretation. As well as offering an amount of detail on top of the alone recognition of Javascript, it even stores the results in a Javascript cookie so the tests don’t have to be run on every page load.

WAI-ARIA

The final layer of hotness is WAI-ARIA, the W3C standard, the recommendation for making today’s RIA and heavily client-scripted Web sites accessible.

WAI-ARIA adds semantic metadata to HTML tags, allowing the developer to add descriptions to page elements to define their roles. For example, declaring that an element is a ‘button’ or a ‘menuitem’. In Dirk’s words, it maps existing and well known OS concepts to custom elements in the browser.

As well as page elements, page sections or ‘landmarks’ can be described too – declaring, for example, a page’s ‘navigation’, ‘banner’ or ‘search’ box – these look very similar to HTML 5.

There’s a great introduction on the Opera Developers site.

WAI-ARIA is not something I’ve used before, but looking at the specification there seems to be a lot of definitions you can add into you code – it looks very extensive.

Although it is ready to use, it will apparently invalidate your code unless you specify a specific DTD. Dirk didn’t actually point to where you can find these, though I’ve seen that you can make them yourself, however I don’t know if this is what he was suggesting.

Dirk has also uploaded his slides onto Slideshare, including the code I’ve used above, you can see them here:

Yesterday, I wrote a ‘how to’ on installing and running multiple versions and concurrent instances of Firefox on Windows XP.

But what about the other browser choices? After all, my original intention was a to develope a versatile testing environment, specifically for cross-browser, cross-platform intended web sites.

Surprisingly, running multiple versions of the other major browsers isn’t as complicated as the Firefox process.

Opera, for example, gives you the option whether to install the set-up as an upgrade or separately, straight out of the box. They offer alternate releases of the current version on their site (9.62 at the time of writing) and have a publicly available archive that goes back to version 3.21 for any old release candidates you need to test.

If you want to run multiple versions of Internet Explorer, you can alter various system and user profile settings in a similar way to my method with Firefox, but it’s far easier to take advantage of the many ‘standalone’ versions you can find online. These are generally third-party, non-Microsoft developments.

TredoSoft have collated standalone versions of Internet Explorer from 3 up to 6, ready to install from a single set-up – it’s called Multiple IE.

It’s brilliant to see IE3, I decided I’d use it as my default browser for a day – loved seeing the frantic alerts about some alien idea called a ‘cookie’ and whether I wanted to risk accepting it onto my computer.

NB: If you’re concerned about what’s being installed when you use Multiple IE, you can do it all yourself with the instructions on Manfred Staudinger’s Multiple IE page.

There’s standalone applications for other browsers too. I only use Windows nowadays, but I’ve recently found Michel Fortin’s standalone versions of Safari – he’s even numbered the icons for your dock (via). That page also links to instructions on running multiple versions of Firefox for Mac.

As for testing Linux system – and this goes beyond HTML and CSS debugging, I use VMware Player from VMware. Not only because when I’ve been developing server-side applications, I’ve not wanted to bother installing those on my home computer base – because it can be tricky, time-consuming, potentially damaging if things go wrong, etc etc and I tend to use Linux-based system for deployment anyway – but because appliances are so damn handy.

Virtual appliances run within a virtual machine like VMware Player as self-contained, packaged software. They can be created and restored as system images, so if something goes wrong – it’s so easy to turn back, with no risk to whatever personal data you might have on your computer as you would installing software as services on the base.

More than that, they’re readily available. VMware has an Appliance Marketplace, with over 900 ready-to-go appliances and a simple, central repository to develop or distribute your own.

There’s popular Linux distributions, various Red Hat, Ubuntu, Fedora – all pretty clean, the basic install, but also some interesting others.

I particularly like the Web Developer appliance, specifically designed to safely test and fine tune web apps. Based on Ubuntu, the creator has consciously included some trendy applications that are gathering more attention, like Ruby on Rails. On top of the expected with Apache, PHP and MySQL, you get a a handful of browsers, various database and debugging tools, code and graphics editors, all as standard, all configured and running – great way to get started.

I’ve recently completed building a new site for the BBC, this time a project pretty much entirely in HTML. As you’d probably expect, the BBC are pretty hot on maintaining a wide foundation of web standards and providing a high level of accessible content, two approaches I’d say I’m a keen practitioner of.

I truly believe that there is zero excuse for slapping ‘this page is best viewed with [browser name] in [screen dimension]‘ on any website, unless it is specifically designed otherwise – and especially if it’s text and image content only – when it’s really so straightforward to adhere to some basic standards which, with almost an exponential effect, can improve the way your content is delivered, cross-browser, cross-platform. That kind of disclaimer is just a cop out, plus Tim agrees - and he’s the man.

A part of that development process is to determine, amongst things, which browsers on what device or platform are the priority. In turn, getting hold of these browsers readily for testing and ideally, available frequently for an agile development. Not leaving testing until a stage where any damage might be irreparable because of time constraints and deadlines, or viewing testing time as an easily squeezable phase, first to suffer when scope is altered.

The BBC have outlined such requirements, they have very in depth guidelines publicly available on their site. The following table defines the ‘levels’ of browser support that all projects must comply with:

BBC Browser Support 

Abbreviated definitions:

Level 1:

  • All content and functionality must work, minimised variations to presentation, fully-styled, maximise user experience.

Level 2:

  • Core content must be readable, usable and navigable, any degradation to must be graceful, no content must be obscured.

Level 3:

  • No support or testing necessary. 

 

Read the full support documentation.

What can be really tricky sometimes though, is getting hold of the all those browsers and platforms to test with. There’s various ways of reconfiguring application and registry settings to install multiple versions of browsers, but the following method is how I installed and concurrently ran multiple instances and versions of Firefox on Windows XP.

Firstly, Firefox 3 (current version, 3.0.4) is readily available from Mozilla, as are previous releases of Firefox 2. Firefox 1.5 though is slightly harder to find. Along with a strong recommendation not use them, you can get other releases from the FTP archive going back to v0.10!

With each set-up, select Custom Installation, giving each version a different installation folder, so something like:

C:\Program Files\Mozilla Firefox 1.5
C:\Program Files\Mozilla Firefox 2.0
C:\Program Files\Mozilla Firefox 3.0

Be sure to uncheck the option to run Firefox when you click Finish, this bypasses writing some default system settings.

Then you’ll need to create a seperate Firefox profile for each version of the browser you’ll be running. Open the profile manager from Start > Run:

“C:\Program Files\Mozilla Firefox 2.0\firefox.exe” -ProfileManager

It doesn’t matter which you choose. Create three new profiles, I named mine ‘Firefox 1.5′, ‘Firefox 2.0′ and ‘Firefox 3.0′ to keep it obvious.

Then I created three .bat files in Notepad, these function as shortcuts to the different versions, as follows:

Firefox 1.5:

set MOZ_NO_REMOTE=1
start “” “C:\Program Files\Mozilla Firefox 1.5\firefox.exe” -P “Firefox 1.5″
set MOZ_NO_REMOTE=0

Firefox 2.0:

set MOZ_NO_REMOTE=1
start “” “C:\Program Files\Mozilla Firefox 2.0\firefox.exe” -P “Firefox 2.0″
set MOZ_NO_REMOTE=0

Firefox 3.0:

set MOZ_NO_REMOTE=1
start “” “C:\Program Files\Mozilla Firefox 1.5\firefox.exe” -P “Firefox 3.0″
set MOZ_NO_REMOTE=0

Obviously change the path and profile names if you don’t use the same as mine, open those up et voila!

Running multiple versions of Firefox

Click to enlarge [via].

Update (18.08.09): This also works with Firefox 3.5, just follow the same steps!

I ain’t here for business, I’m only here for fun.