Just enough structure?

Structured content is simply text has been marked up so that the semantics of a piece of text (“this is a title” and “these paragraphs form a chapter”) are explicit in the mark-up. In XML and HTML we do this with tagging.

Structured content can be organised and distributed on a number of platforms (mobile, web, print, app, ebook, database...). You can mix it up, repurpose it and reuse it, monetise it.

So, clearly, structured content is better, right?

We all agree on that don't we?

No, we don't.

Well, first, it's hard to say if there are definite benefits to structuring text in many cases. I tend to the view that structuring is valuable if you are going to process it and you can identify repeatable logical chunks. This implies that the interesting unit of structure in a novel might be the chapter. In a legal text, it may well be the paragraph but it would also be the sub-section, the section and the chapter.

The problem is that, when it comes to writing, structuring is too often counter-intuitive to narrative. It can even be a block.

Structuring interferes with writing

From my point of view as a consumer and processor of text (and someone who writes software to consume and process text), structured content is definitely better.

However, for most authors, structure interferes with the process of writing.

I started writing this blog entry when I was writing one about tools for converting Word content to structured XML. I realised that I wanted to get my thoughts on structure written down. Of course, this goes to show how structured my writing process isn't.

There is an important difference between the way authors write text and the way that we use software to process that text. Processing requires some sort of structure. For many types of text the required structure is minimal (think of a short story or even a novel). For other types of text, such as the legal publications I've been working with recently, structure is essential (as is a heavy semantic layer).

Sequential structuring

When I write something (this blog entry for example), I write a title and then I write some text. I might follow that with a subtitle and some more text, then an image. Now, the important part of that statement is the sequencing: (a) followed by (b) followed by (c). The structure is there but it's sequential.

Those of us in the XML world operate on the premise that structure brings benefits. There are clear benefits to structured text but none of those benefits are for the author of the text. We think linearly when we write text. That's the basic problem with structured authoring.

Structured authoring fail

That means that, for a XML person like me, there is a problem. I want structured text, but authors don't want to write structured text (and why should they?). Every single structured authoring tool I have seen gets in the way of the author. Tools that I love and recommend on a daily basis like oXygen offer structured authoring but I have never come across an author who will voluntarily use it.

In order to get structured content that accurately represents the intent of the author we must use tools to convert from authored text to structured text. There currently isn't a sensible alternative to that approach - structured authoring fails in the general case.

Just enough structure?

There is a market for tools that support structured authoring, but it never seems to do very well. Nothing dominates.

Perhaps it is time to accept that the author process and/or tools need to provide just enough structure. Then conversion to HTML (pretty trivial) and conversion to XML (not trivial but not particularly difficult) can happen. “Just enough structure” can probably be defined as “apply Word styles”. I think it's time for the XML world to give up on structured authoring because all it does is annoy the authors.