This month’s YDN Tuesday was presented by Steve Marshall, he spoke about code generation, ‘Writing the Code that Writes the Code’.

A bit of a disclaimer: Going into this talk I wasn’t sure how much I’d be able to take from it, or whether what Steve would be discussing would be relevant to my own kind of development. I figured he’d be talking about working with huge projects with (literally) tens of thousands of lines of code and I don’t think that the evening was part of the ‘RIA’ umbrella which is probably closest to home for me. Anyway although Steve’s talk did cover those kinds of projects, he also went through some platform-agnostic fundamentals – the kind of transferable ideas that could be applied to any coder – and it’s really those I’m looking at here.

Code Generation in Action

Steve opened with a bit of code generation evangelism, introducing a discussion as to why code generation is something that all developers should want and should do.

Each point here I think speaks for itself, but only really having gone over each benefit staring me in the face have I questioned, hold on, why don’t I do this already? It’s a convincing argument..

The most immediate is scalability – so often development projects have large amounts of repetition, say in creating views, or sets of view/mediator/component classes, or whatever – needless to say as projects get larger, generating this kind of code from templates and base classes reduces or even eliminates the room for error.

Neatly, this this was his next point – consistency. Automated code cannot be mistyped or have any part overlooked (as could when writing by hand), unless of course the source is incorrect. And even if that’s the case, a small change can be rolled out to all generated code once corrected. Which brings us to his third point – automatic code is of course far quicker to produce.

Steve also suggests that generated code is ‘predictable’, in a way. Whether looking at code that has been generated or that written for ‘expansion’ (more on this later), on the whole it’s consequently easier to digest. Likewise, source code that is to be used by a generator or to be ‘expanded upon’ is, by its very nature, a smaller volume – so therefore easier to approach.

It’s also a single point of knowledge. Only the core source code needs to be understood. A project of tens of thousands of lines of code would be near impossible to traverse or otherwise understand, knowing that code has been generated from a smaller source that you do understand offers a reassurance that the rest of the code is good.

Another advantage is the ease of which the code can be refactored. Obviously only the source code base requires changing, the larger volume is generated automatically. This kind of code is language independent.

Outside of working with code direcly, the kind of abstraction writing for code generation offers its own benefits. Although arguably a desirable trait of good programming anyway, Steve suggests this kind of code is far more decoupled with the design process. Thinking more abstractly about coding means it has less effect on design capability, and vice-versa. Design reviews or changes should have no bearing on the coding. By being naturally quicker to produce, Steve also says that generated code is a good methodology to pursue when prototyping applications.

Steve also discussed testing, offering that generated code has (or presumably can have) 100% test coverage. Similarly, full documentation is easier achieved by way of having written a smaller code base.

Overall, cleaner code that’s easier to work with and easier to understand is undeniably a more attractive prospect to work with. Developers are more enthusiastic about working with that kind of code and so that kind of code is of a higher quality – in the first place, as well as because it has been consistently deployed, tested and documented thereafter.

So how is this achieved?


Steve recognised six principle methods for generating code, each technique seemingly increasingly more ‘advanced’ than the previous and, as I said before, each progressively more suited to larger-scale projects.

He saw these as:

  1. Code munging
  2. Inline code expansion
  3. Mixed code generation
  4. Partial class generation
  5. Tier generation
  6. Full domain language


I’ll look at each briefly.

Code munging

Code Munging
Code munging essentially takes your code, parses, introspects and creates something new that’s not code, though still of benefit and in inherently useful.

Steve’s examples were Javadoc or PHPdoc, it’s the most basic form of generation you can introduce to your workflow.

Inline code expansion

Inline Code Expansion
This the ‘expansion’ mentioned above, where source code is annotated with keywords or variables where upon additional code is generated to replace those markers to extrapolate upon the original.

Templates are the most rudimentary example. This technique gets repetitive jobs done well, quickly and correctly.

Mixed code generation

Mixed Code Generation
This is the first ‘real’ form of generation, this expands upon Inline code expansion by allowing the output code to be used as input code once created. Instead of simply replacing special keywords for example, mixed code generation may use start and end markers, expanding code as before, but allowing the result to be worked upon once completed.

This may include code snippets, or smaller templates injected within classes.

Partial class generation

Partial Class Generation
I guess here’s where things start to get serious. Partial class generation begins to look at modeling languages like UML, creating abstract system models and generating base classes from a definition file.

The base classes are designed to do the majority of the low level work of the completed class, leaving the derived class free to override specific behaviors on a case-by-case basis.

Tier generation

Tier generation
This takes partial class generation a level further, a tier generator builds and maintains an entire tier within an application – not only the base classes. Again from some kind of abstract definition file, but the generator this time outputs files that constitute all of the functionality for an entire tier of the application – rather than classes to be extended.

Full domain language

Steve didn’t really talk about this much, he said it’s very tough – and even if you think this is what you want to do, it’s probably not (though a hand from the crowd wanted to reassure that it might not be quite that intimidating).

A domain language is a complete specification dedicated solely to a particular problem – with specific types, syntax and operations that map directly to the concepts of a particular domain.

I used the Code Generation Network’s Introduction for reference here, which looks like a good resource.

Also, all the diagrams belong to Steve. He’s put his presentation slides on Slideshare and uploaded PDF and Keynote versions on his website.

The presentation was also recorded as a podcast and is now on the Skills Matter site.

In retrospect

After all these, Steve took us through an example he’d worked on where code generation really proved it’s worth (a huge project for Yahoo!) – and offered tips and best practices to use once you’ve convinced your team to take on code generation (see the slides linked to above).

Though in my opinion here he seemed to seemed to almost contradict some of the things he’d talked about before. He pointed to some tools to use within your workflow, but spoke extensively about writing your own generator. Stressing that it can be written in any language, though should be written in a different language to that of the code being produced, that its easier than perhaps you think – though still with a fair few caveats.

But what happened to cutting down the work load? As the talk summary says, developer are fundamentally very lazy – we want to do less and less, surely this requires more effort?

One of his recommendations is that you document your generator once you’ve written it. But how would you create that documentation, with another generator? Is that meant to be a specially written generator also? And that needing documentation too?

I remember an episode of Sesame Street from my childhood, where Big Bird was painting a bench. Once he’d finished he made a ‘Wet Paint’ sign with the paint remaining, only to realise that the sign also was wet, so needed a ‘Wet Paint’ sign of it’s own. Which of course, was also wet and needed it’s own sign. Cut back a scene later and the whole street is full of signs. It was upsetting, to say the least.

I’m being pedantic, I know.

But I guess that’s where I recognise a line for me, or at least the kind of development I do. Very large-scale projects do often need this kind of approach, Steve is proof of that. For me only the first few techniques are appropriate. Though those, undeniably, are incredibly beneficial when you have the opportunity to put them into play.

On a similar note, this month’s LFPUG meeting had a presentation on UML for AS3 by Matthew Press, the recorded video for which is now online – have a look!

Maybe we ain’t that young anymore.