Category Archives: Ydn

This month’s YDN Tuesday was presented by Steve Marshall, he spoke about code generation, ‘Writing the Code that Writes the Code’.

A bit of a disclaimer: Going into this talk I wasn’t sure how much I’d be able to take from it, or whether what Steve would be discussing would be relevant to my own kind of development. I figured he’d be talking about working with huge projects with (literally) tens of thousands of lines of code and I don’t think that the evening was part of the ‘RIA’ umbrella which is probably closest to home for me. Anyway although Steve’s talk did cover those kinds of projects, he also went through some platform-agnostic fundamentals – the kind of transferable ideas that could be applied to any coder – and it’s really those I’m looking at here.

Code Generation in Action

Steve opened with a bit of code generation evangelism, introducing a discussion as to why code generation is something that all developers should want and should do.

Each point here I think speaks for itself, but only really having gone over each benefit staring me in the face have I questioned, hold on, why don’t I do this already? It’s a convincing argument..

The most immediate is scalability – so often development projects have large amounts of repetition, say in creating views, or sets of view/mediator/component classes, or whatever – needless to say as projects get larger, generating this kind of code from templates and base classes reduces or even eliminates the room for error.

Neatly, this this was his next point – consistency. Automated code cannot be mistyped or have any part overlooked (as could when writing by hand), unless of course the source is incorrect. And even if that’s the case, a small change can be rolled out to all generated code once corrected. Which brings us to his third point – automatic code is of course far quicker to produce.

Steve also suggests that generated code is ‘predictable’, in a way. Whether looking at code that has been generated or that written for ‘expansion’ (more on this later), on the whole it’s consequently easier to digest. Likewise, source code that is to be used by a generator or to be ‘expanded upon’ is, by its very nature, a smaller volume – so therefore easier to approach.

It’s also a single point of knowledge. Only the core source code needs to be understood. A project of tens of thousands of lines of code would be near impossible to traverse or otherwise understand, knowing that code has been generated from a smaller source that you do understand offers a reassurance that the rest of the code is good.

Another advantage is the ease of which the code can be refactored. Obviously only the source code base requires changing, the larger volume is generated automatically. This kind of code is language independent.

Outside of working with code direcly, the kind of abstraction writing for code generation offers its own benefits. Although arguably a desirable trait of good programming anyway, Steve suggests this kind of code is far more decoupled with the design process. Thinking more abstractly about coding means it has less effect on design capability, and vice-versa. Design reviews or changes should have no bearing on the coding. By being naturally quicker to produce, Steve also says that generated code is a good methodology to pursue when prototyping applications.

Steve also discussed testing, offering that generated code has (or presumably can have) 100% test coverage. Similarly, full documentation is easier achieved by way of having written a smaller code base.

Overall, cleaner code that’s easier to work with and easier to understand is undeniably a more attractive prospect to work with. Developers are more enthusiastic about working with that kind of code and so that kind of code is of a higher quality – in the first place, as well as because it has been consistently deployed, tested and documented thereafter.

So how is this achieved?


Steve recognised six principle methods for generating code, each technique seemingly increasingly more ‘advanced’ than the previous and, as I said before, each progressively more suited to larger-scale projects.

He saw these as:

  1. Code munging
  2. Inline code expansion
  3. Mixed code generation
  4. Partial class generation
  5. Tier generation
  6. Full domain language


I’ll look at each briefly.

Code munging

Code Munging
Code munging essentially takes your code, parses, introspects and creates something new that’s not code, though still of benefit and in inherently useful.

Steve’s examples were Javadoc or PHPdoc, it’s the most basic form of generation you can introduce to your workflow.

Inline code expansion

Inline Code Expansion
This the ‘expansion’ mentioned above, where source code is annotated with keywords or variables where upon additional code is generated to replace those markers to extrapolate upon the original.

Templates are the most rudimentary example. This technique gets repetitive jobs done well, quickly and correctly.

Mixed code generation

Mixed Code Generation
This is the first ‘real’ form of generation, this expands upon Inline code expansion by allowing the output code to be used as input code once created. Instead of simply replacing special keywords for example, mixed code generation may use start and end markers, expanding code as before, but allowing the result to be worked upon once completed.

This may include code snippets, or smaller templates injected within classes.

Partial class generation

Partial Class Generation
I guess here’s where things start to get serious. Partial class generation begins to look at modeling languages like UML, creating abstract system models and generating base classes from a definition file.

The base classes are designed to do the majority of the low level work of the completed class, leaving the derived class free to override specific behaviors on a case-by-case basis.

Tier generation

Tier generation
This takes partial class generation a level further, a tier generator builds and maintains an entire tier within an application – not only the base classes. Again from some kind of abstract definition file, but the generator this time outputs files that constitute all of the functionality for an entire tier of the application – rather than classes to be extended.

Full domain language

Steve didn’t really talk about this much, he said it’s very tough – and even if you think this is what you want to do, it’s probably not (though a hand from the crowd wanted to reassure that it might not be quite that intimidating).

A domain language is a complete specification dedicated solely to a particular problem – with specific types, syntax and operations that map directly to the concepts of a particular domain.

I used the Code Generation Network’s Introduction for reference here, which looks like a good resource.

Also, all the diagrams belong to Steve. He’s put his presentation slides on Slideshare and uploaded PDF and Keynote versions on his website.

The presentation was also recorded as a podcast and is now on the Skills Matter site.

In retrospect

After all these, Steve took us through an example he’d worked on where code generation really proved it’s worth (a huge project for Yahoo!) – and offered tips and best practices to use once you’ve convinced your team to take on code generation (see the slides linked to above).

Though in my opinion here he seemed to seemed to almost contradict some of the things he’d talked about before. He pointed to some tools to use within your workflow, but spoke extensively about writing your own generator. Stressing that it can be written in any language, though should be written in a different language to that of the code being produced, that its easier than perhaps you think – though still with a fair few caveats.

But what happened to cutting down the work load? As the talk summary says, developer are fundamentally very lazy – we want to do less and less, surely this requires more effort?

One of his recommendations is that you document your generator once you’ve written it. But how would you create that documentation, with another generator? Is that meant to be a specially written generator also? And that needing documentation too?

I remember an episode of Sesame Street from my childhood, where Big Bird was painting a bench. Once he’d finished he made a ‘Wet Paint’ sign with the paint remaining, only to realise that the sign also was wet, so needed a ‘Wet Paint’ sign of it’s own. Which of course, was also wet and needed it’s own sign. Cut back a scene later and the whole street is full of signs. It was upsetting, to say the least.

I’m being pedantic, I know.

But I guess that’s where I recognise a line for me, or at least the kind of development I do. Very large-scale projects do often need this kind of approach, Steve is proof of that. For me only the first few techniques are appropriate. Though those, undeniably, are incredibly beneficial when you have the opportunity to put them into play.

On a similar note, this month’s LFPUG meeting had a presentation on UML for AS3 by Matthew Press, the recorded video for which is now online – have a look!

On Monday I attended Mark Birbeck‘s seminar The Possibilities of RDFa and the Semantic Web, an ‘in-the-brain’ session on RDFa and structured data discussing some of the exciting possibilities created by authors implementing structured data via RDFa into Web pages.

We looked at some existing implementations, how to enhance very simple Web pages to be machine-’readable’ and looked at RDFa as part of the bigger picture – in correspondence to RDF, Microformats and other technologies, and the Semantic Web.

RDFa (Resource Description Framework in attributes) is a set of extensions to XHTML, a set of attributes intended to be implemented by Web page publishers to introduce extra ‘structure’ to the information in documents. These attributes are added at author-time and are completely invisible to the end-user. RDFa is essentially metadata, additional machine-readable indicators that describe the information they annotate without intruding on existing mark-up.

The W3C’s RDFa Primer is a good introduction for those unfamiliar with the technology and it conveys the current problem of the separation between the information on the Web readable by humans and that ‘readable’ by machine – the data Web.

Today’s Web is built predominantly for human consumption. Even as machine-readable data begins to appear on the Web it is typically distributed in a separate file, with a separate format, and very limited correspondence between the human and machine versions. As a result, Web browsers can provide only minimal assistance to humans in parsing and processing data: browsers only see presentation information.

On a typical Web page, an XHTML author might specify a headline, then a smaller sub-headline, a block of italicised text, a few paragraphs of average-size text, and, finally, a few single-word links. Web browsers will follow these presentation instructions faithfully. However, only the human mind understands that the headline is, in fact, the blog post title, the sub-headline indicates the author, the italicised text is the article’s publication date, and the single-word links are categorisation labels.

Presentation vs Semantics

The gap between what programs and humans understand is large. But when Web data meant for humans is augmented with machine-readable hints meant for the computer programs that look for them, these programs become significantly more helpful because they can begin to understand the data’s structure and meaning.

RDFa is getting a lot of attention lately, not least with developments of HTML5 and the consideration of its implementation with the specification. As of October 2008, RDFa in XHTML is a W3C Recommendation, but it was Mark Birkbeck who was first to propose RDFa in a note to the W3C in February 2004 – so I was looking forward to this.

Publishing information and publishing data

Mark started with a modern use-case for RDFa, showing us how we can and why we should put structured data in our sites.

He pointed out that online publishing has never been easier. There are a huge number of blogging services offering free, easy to set-up and maintain platforms that are long established and now with sites like Twitter offering the ability to update via SMS or from a mobile device, the Web is enabling a level of ubiquitous publishing that we’re now completely familiar with.

This is us publishing information, human-readable content. But he says that publishing data, however, still remains hard.

By data, he refers to information ‘without prose’ – think listings on eBay or Yelp, where numbers and stats are what matter over articles of text. He suggests that places like eBay and Yelp are the only places where publishing data is easy. But by using these sites, especially those where the majority of content is user-generated, raises questions as to who actually owns the data.

As a result, people have begun to make their own sites, for maintain their own data. That sounds easy enough, but the problem is that the most accessible means to make websites nowadays is to use the blogging platforms that I mentioned earlier – those that aren’t really meant for publishing data, they’re meant for publishing information.

Mark showed some examples where this is happening, WordPress and Blogger type sites set up by individuals for reviewing films, or books, or restaurants etc – where each blog ‘post’ was an entry for an individual film or book, and details like scores and ratings (the valuable data) is lumped together in the article body – because that’s all it can really do – it’s meant for prose.

Of course, that’s fine, for their own sake. They have their content on the Web, on a domain they control. But often this content is of real value, its useful for more than just the author and other people should be see it.

Often though without being on sites such as eBay or Yelp, users will infrequently find them. This is because the norm for finding content is by way of a search engine, i.e. using a machine to find human information. True, some independent sites do rank highly in search engines, but for any newly launched site it is near impossible.

So enter RDFa. By using RDFa, authors can package their existing content in additional mark-up so machines can discriminate between what parts of a long body of text refers specifically to valuable discreet details. Authors can target and isolate individual pieces of text and associate these with a predefined vocabulary of terms specifically for the relevant context of the subject data.

An example

The following is Mark’s example blog post, some typical mark-up for a book review:

Title: <span>Chanda’s Secrets</span>

Stars: <span>*****</span>

I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…


This is very simple mark-up, perfectly understandable for human consumption, but offers machines agents zero indication as to what the content refers to.

RDF is Resource Description Framework, it’s purpose is to describe. To do so, it references vocabularies of terms – definitions that provide agreed-upon tags for developers to use that indicate unambiguous details. Machine applications finding this data can then interpret these as they choose, matching the unique identifying terms to the same vocabulary in their process as reference to exactly what was intended, to infer meaning.

There is a vocabulary for book reviews, for example. The following code takes the mark-up above and adds RDFa, enriching the data for the machines ‘viewpoint’, while having no impact on its presentation – the human viewpoint:

<div xmlns:v=”” typeof=”v:Review”>
<span rel=”v:itemreviewed”>
Title: <span property=”v:name”>Chanda’s Secrets</span>

Stars: <span property=”v:rating” content=”5″>*****</span>

<span property=”v:summary”>
I reviewed Stratton’s newest teen novel, Leslie’s Journal in October.
I’d heard about Chanda’s Secrets and wanted to give it a try…


A little run through of the code – the first line adds a reference to the RDFa vocabulary to use, defining a XML namespace, v, and declares that everything within this div is a review (type v:Review). We add v:itemreviewed to the title and specify its v:name explicitly. This review class has a rating property, so in the same way we add the v:rating property to the five stars. Here you can see another example of a lack of machine/human understanding – we can see five asterisks (they aren’t even five stars) and work out what they refer to – but a machine has no idea. So we ‘overwrite’ the inline text with the content attribute, declaring outright that our rating has a value of 5. The summary is tagged in the same way then.

Now it’s not like this means the post would jump straight to the top of Google’s search results. RDFa isn’t about solving that problem, but it helps. It allows machines to infer meaning and act accordingly, if the application is capable of doing so.

For example, any search engine could probably find this post as by matching the search term ‘Chanda’s Secrets’ in a standard lookup query. But recently Google added RDFa support and Yahoo! have been doing it for over a year, and both sites now have specific applications for parsing RDFa on top of their normal workings.

Google’s is called Rich Snippets, which returns this kind of result:

A Google 'Rich Snippet'

It’s an enhanced search result that can feature context-relevant details alongside the link, in this case a star-rating and price range indicator. It’s with this kind of application you can begin to see how adding structured data can make powerful improvements to your site.

Writing a review and having everything lumped together in the body of an article is fine for humans reading the content, they can make sense of the words, numbers and pictures themselves, but a machine can’t. By annotating your review – indicating, for example, a restaurant rating or the price of your food – you make your data machine-processable, so also able to be aggregated with others. This is called data distribution. In this way, you also still ‘own’ your content because it originates from your site, but it is as computable as those larger sites and databases.

More possibilities

As well as being used as a means to structure data in a desired format, RDFa can also be used to standardise data, again for interoperability and processing.

Mark presented some job aggregation sites that demonstrate this. Each site was made with a different technology (ASP .NET, PHP, static pages – whatever) and each displayed data in a different way, not only in their presentation but also by using different terminology – a job ‘title’ or ‘name’ or ‘position’, for example. RDFa indicators standardise these ambiguities, so third-party sites, such as aggregators, can intelligently collate the data with an understanding of the differences.

He also spoke about joining the linked data cloud, showing sites that could use the unique identifiers of a vocabulary with reference to their own content for an opportunity to enrich their pages. He had a nifty example that populated a book review page by using the relevant class and retrieving the book’s cover from a external source that had tagged their data in the same way. That match ensured the discovered data would be relevant.

Finally he talked about vertical search. Where RDFa, at least at it’s most rudimentary, can be used to work with ambiguous words, synonyms and such. A search for a chemical symbol, in traditional lookup search, would not take into account it’s chemical name or any other term that refers correctly to it, albeit by a different word. With an application that converts said terms to an agreed upon identifier, for example it’s chemical formula, users could retrieve all references, by whatever name, tagged also with that identifier.


The Q&A to close the seminar was also really useful. Here Mark talked about Microformats, in comparison to RDFa, something I was waiting for. He pointed out that Microformats each require their own parser, need references to a separate vocabulary for each microformat used and commented on the standard being centrally maintained by a handful of people. RDF is decentralised, there is no governing body – that anyone can create a vocabulary (though of course that might not always be the best answer), and really made Microformats seem clunky and second-best to RDFa – an opinion I hadn’t held before.

Mark also pointed out that RDFa is arguably more extensible in that can relate to references not on the actual page of implementation. For example on a page where a license is being specified, with RDFa and by using the about tag, users can refer to objects linking from their page – such as listing different licenses for a set of external images or videos, rather than only being able to describe the license for the page itself, the page that these images link from.

Eventually there was some more Semantic Web talk, discussing technologies such as RDF, OWL and ontologies – but not as much as I’d hoped.

In fact, a lot of the content of the talk can be found in Mark’s excellent articles written for A List Apart, his Introduction to RDFa and Introduction to RDFa, Part II.

I highly recommend reading these, he explores the different methods of annotating your pages, how to apply attributes to various kinds of elements, discusses vocabularies, rules and how to express relations – a great introduction.

Overall, a great talk. He’s also now uploaded his slides, with a video recording of the session pending.

Here they are:

Last week I attended a YDN Tuesday, a developer talk hosted by the Yahoo! Developer Network led by Dirk Ginader, discussing Web Accessibility.

It looks as if these presentations have been running for a while now and they’ve got a good schedule lined up for the coming months. They discuss a decent section of Web development beyond the pure skills – JS, DOM, PHP, OAuth, Web services, Yahoo! technologies and by the looks of things have AJAX, Flex and Ruby on Rails in the pipeline.

They’re also free, which is great when you’re sitting down to hear Yahoo! experts talk about what they do best!

Dirk Ginader is part of the Accessibility Task Force at Yahoo! and tackled developing fully accessible Web applications at every level – covering basic markup, best practices with CSS and accessible Javascript, finishing with a discussion on WAI-ARIA, offering some of his insight gained from working with the standard.

Most people are familiar with the common three-layer development of Web sites, building a core HTML structure, styling with CSS and enhancing functionality with Javascript. In his talk though, Dirk introduced a five-layer development method and spoke about this throughout the sessions.

Dirk Ginader's 5-layer Web development

Building on top of the common three-layer method, Dirk spoke of adding levels of ‘CSS for Javascript’, i.e. adding extra CSS if Javascript is available and enhancing the interface to promote this – and a final layer of WAI-ARIA, the W3C standard for accessible rich Internet applications.

The core layers – HTML, CSS, JS

First though Dirk went into the basics, giving a good exploration of the first shared three layers – reiterating the importance of good, clean HTML, appropriate and logical tab ordering, form building and that it should, obviously, be usable without CSS and Javascript.

Again he reiterated the importance of dividing CSS and Javascript, simply, as it always should be, that CSS is for styling and Javascript is for interaction. CSS can be used achieve a lot of interactivity functionality that would otherwise be controlled by Javascript, but these are akin to hacks, says Dirk.

Another accessibility oversight is the assumption that all users have a mouse or pointing device – and as such, all design is appropriated for mouse control. If your mark-up is good and each ‘level’ of your development has been tested and is robust, your site should be completely navigable with just the Tab and Enter keys. Also, any CSS that uses the mouse-only :hover effects, should also use :focus, which includes active tabbing.

I always feel that approaching Web development in view to adhere to strict standards and to maintain accessibility always helps produce cleaner code and generally minimise errors and cross-browser inconsistencies in the long run anyway.

Dirk spoke about the usefulness of the focus() Javascript function, in bringing users’ attention to alerts, changes, altered screen states and such – especially handy for users with screen readers or screen magnifiers.

On the subject of screen readers, Dirk spoke about how they really work, how they see a Web page and handle the content – talking about loading systems and various reading modes. This was great becausue although I’ve designed for screen readers before, I’ve never seen one being used or had a go myself – and I’m sure I’m not the only one.

CSS for Javascript

The first extra level of Dirk’s five-layer development is in adding CSS when Javascript is available. This means your interface can be altered knowing that Javascript can be used.

You can use Javascript to append an additional class to page elements so that you can use CSS to target and style them. For example, the following line of code adds a class named ‘js’:

document.documentElement.className += ” js”;

You would then style with the follow CSS, where the first declaration is global and the second applied only if Javascript has been found and appended said ‘js’ class to an element:

.module {
/* Both layouts */
.js .module {
/* Javascript layout */

Enhancing a page in this way isn’t anything new, but it is very cool.

If you’ve heard of the term Progressive Enhancement, then you’ll know why. If you’ve not, you may have heard of Graceful Degradation. Both are methods for handling differences in browser rendering and capability, they’re similar but subtly different.

Graceful degradation, or ‘fault tolerance’, is the controlled handling of single or multiple errors – that when components are at fault, content is not compromised. In developing for the Web, it means focusing on building for the most advanced and capable browsers and dealing with the older, second.

Progressive enhancement turns this idea on it’s head, focusing on building a functioning core and enhancing the design, where possible, when capable.

There are a good few articles on A List Apart about that I strongly recommend bookmarking:


The last article, Scott Jehl discusses enhancement with both CSS and Javascript and has a similar trick of appending class names to page objects once Javascript has been detected. He talks about how to test those capabilities and offers a testing script, testUserDevice.js, which runs a number of tests and returns a ‘score’ for your interpretation. As well as offering an amount of detail on top of the alone recognition of Javascript, it even stores the results in a Javascript cookie so the tests don’t have to be run on every page load.


The final layer of hotness is WAI-ARIA, the W3C standard, the recommendation for making today’s RIA and heavily client-scripted Web sites accessible.

WAI-ARIA adds semantic metadata to HTML tags, allowing the developer to add descriptions to page elements to define their roles. For example, declaring that an element is a ‘button’ or a ‘menuitem’. In Dirk’s words, it maps existing and well known OS concepts to custom elements in the browser.

As well as page elements, page sections or ‘landmarks’ can be described too – declaring, for example, a page’s ‘navigation’, ‘banner’ or ‘search’ box – these look very similar to HTML 5.

There’s a great introduction on the Opera Developers site.

WAI-ARIA is not something I’ve used before, but looking at the specification there seems to be a lot of definitions you can add into you code – it looks very extensive.

Although it is ready to use, it will apparently invalidate your code unless you specify a specific DTD. Dirk didn’t actually point to where you can find these, though I’ve seen that you can make them yourself, however I don’t know if this is what he was suggesting.

Dirk has also uploaded his slides onto Slideshare, including the code I’ve used above, you can see them here:

Jump a little lighter.