This is the first in what will be an occasional series of DITA application discussions around the subject of applying the DITA standard to Publishing business problems (that is, using DITA-based XML markup as the source for documents produced by Publishers, as opposed to documents produced by product companies). Because DITA comes out of the technical documentation domain (it was developed by IBM as their next generation XML application for technical documentation, especially documentation delivered on the Web or as online help), most of the practical information about DITA reflects technical documentation applications.
But DITA is equally applicable to many Publishing applications, including traditional narrative documents that don't seem, at first look, like candidates for ditification.
DITA's unit of content organization is the "topic". A topic consists of a required title followed by an optional body (which contains any paragraph-level content of the topic) and, optionally, nested topics. This generic model is very simple and very flexible, e.g:
Topic Title
Topic body content (one or more paragraphs, figures, etc.).
Subtopic 1 Title
Subtopic content
Subtopic 2 Title
Subtopic content
Within a topic body you cannot have any subsections or subdivisions. At most you can have a single level of <section> elements but <section> elements cannot nest.
This aspect of topic bodies often leads to the conclusion that DITA cannot be used for content that is naturally a potentially deep hierarchy of sections, such as textbooks, trade books, standards documents, reports, and so on. This seems to stem from the normal focus on topics as the unit of authoring, which most people take to mean you should always author topics in isolation, with no nested topics, as well as confusion of the element name <section> with the generic, everyday concept of "section", which sometimes leads to the (incorrect) conclusion that "DITA cannot have nested sections" (in the generic sense).
Nothing could be further from the truth: DITA can have nested sections (in the generic sense). It just cannot use its <section> element to do it.
The key to a correct understanding is to forget about the DITA <section> element and think about topic bodies as only containing paragraph-level things (figures, tables, lists, etc.).
Without section elements to distract us it becomes clear that nested sections must be represented by nested topics, because that's the only thing available. If we further realize that we can nest topics without restriction, it should become clear that, for the purpose of narrative documents, there's no difference between DITA topics and any other XML division-type element.
If you take the base base DITA topic element (the most generic topic), you can immediately use it to represent pretty much any document you might have lying about simply by creating a topic with tested topics. The base DITA paragraph-level model is sufficiently complete and generic that it should work for pretty much anything (at least for the purpose of simply capturing the basic structure of your content--providing more appropriate tags for your special stuff can come later).
For example, if you have a book that is a sequence of chapters with subsections to any level, you can represent each chapter as a generic topic with nested topics for each subsection (and their subsubsections, and so on). The only aspect that might be a little different from other XML markup approaches (for example, DocBook), is that there is an explicit wrapper element around any paragraph content that occurs between the title of a section and any subsections, e.g.:
<topic>
<title>Chapter 1: Loomings</title>
<body>
<p>Call me Ishmael. Some years ago--never mind how
long precisely --having little or no money in my purse...</p>
</body>
<topic>
<title>The Waterfront</title>
<body>
<p>Right and left, the streets take you waterward. Its extreme down-town is
the battery...</p>
</body>
</topic>
... (more subsection topics [if Melville had actually used subsections]) ...
</topic>
The key bit here is the <body> tag, which is required by DITA in order to contain the topic's main (or initial) content.
Otherwise, the markup is very straightforward and is structurally very similar to what you would do in DocBook or a custom XML application.
To compose your new chapter topics into a complete book you would use DITA's map mechanism to create a document that links to each chapter in the appropriate order, e.g.:
<map>
<title>MOBY DICK; OR THE WHALE</title>
<topicmeta>
<author>Herman Melville</author>
</topicmeta>
<topicref href="moby-chapter-01.xml"/>
<topicref href="moby-chapter-02.xml"/>
...
<topicref href="moby-epilogue.xml"/>
</map>
Obviously, there's lots more that you would need for a complete application, but the point of this exercise is to show that not only can you use DITA to usefully mark up narrative text with nested sections, but that it's easy and inherent in DITA's basic model.
Everything you would do beyond this would be refinement of the application to meet your specific business requirements (e.g., specialized markup to reflect your specific content or editorial rules or business processes or processing needs).
Of course, just because you can do something doesn't mean you should. Having gotten this far, you should be asking questions like "why would I choose DITA over something like DocBook or a purpose-built XML document type?".
The answer, I think, is that DITA offers at least two compelling advantages over any other candidate XML application:
- The initial cost of ownership is low, approaching zero, and the ongoing cost of ownership is low.
- It offers a number of sophisticated features in terms of modularity, extensibility, and linking that either are not provided by other applications or would cost a prohibitively large amount to build from scratch.
That is, the cost of applying DITA is almost always going to be significantly lower than the cost of any alternative (and at worst will be no more expensive than any other alternative). The detailed reasons for this will be explained in future posts, the short answer is simply that you get a lot out of the box with DITA that you would otherwise have to pay for.
If you accept my "DITA is inexpensive" assertion then it follows that if DITA is applicable to a particular XML application use case it is at least a compelling candidate if not the obvious choice.
The challenge so far has been clearly determining when DITA is and is not applicable. By demonstrating that DITA can be used productively and easily for traditional narrative documents we open up a very large domain of potential uses for DITA that might otherwise have been overlooked.
And if you've gotten this far, you might be asking the question "OK, this all makes sense but why do you, Eliot Kimber, care so much?"
The answer is that I care because DITA, because of its unique features such as specialization and maps, and because of a large and growing base of free and low-cost supporting infrastructure, is finally putting the sort of sophisticated XML applications that everyone needs within the reach of even the smallest enterprises. Before DITA, the cost of doing what DITA does out of the box was prohibitive for all but the biggest enterprises. Now anybody can have it.
That means that I, as a solution provider, can focus on the business problems to be solved without having to worry too much about the cost of the infrastructure needed. In the past the conversation was always "we can do X, but we'll have to spend a lot of time building the system that will do X". Now the conversation can be "we can do X, we can do it relatively quickly, and having done X, we'll enable Y an Z you might not have thought of". That's the kind of conversation I prefer to have.
And it is DITA that enables it.
I have some reaction to your post here.
Posted by: Sarah O'Keefe | January 03, 2008 at 09:48 PM
Great piece. I've been looking for exactly this information for a while. What about reusing those nested topics in other book chapters? Are nested sub- topic references allowed? We have a large body of educational music material that we would like to convert to DITA and I'd be interested in learning about what services and products you have to assist us.
Posted by: Bay | January 14, 2008 at 06:59 PM
Using maps you can combine topics in any way you like regardless of whether or not a given topic is physically contained in another topic or managed as a separate document.
How you organize topics into files for storage is primarily about authoring convenience and should not affect your ability to re-use any topic in another context, at least from a mechanics standpoint. Writing topics so that they are rhetorically re-usable is the job of the author, of course.
We provide a full range of services around the application of DITA to specific business problems, including analyzing your requirements to determine how to best apply DITA (or whether or not DITA is even an appropriate technology choice), implementing specializations, and building DITA-based publishing pipelines to meet specific publishing needs.
We are in the process of adding DITA support to our RSuite CMS content management system. You could use it with DITA-based documents today but it wouldn't take full advantage of their DITAness. Soon it will. We have done some technology demonstrations of what this DITA support might look like and will do more over the coming months.
Posted by: Eliot Kimber | January 14, 2008 at 07:30 PM
There are some differences between topic and section. In term of narrative purpose, you may find nested topic does the job for you. However, that is true only if you assume your nested structure must have a title in every single level. Unfortunately in the real world, that is not a necessary case. In addition, section allows being a sibling of p element, which allows you to organise the subsets of information with your topic in a more flexible way.
Posted by: Brian | April 29, 2008 at 03:54 AM
I agree that the requirement for topics to have titles can once in a while pose an issue, although even there you can finesse it by defining a topic type with a specialized title that has empty content (the empty title element can either represent an invariant generated title or be truly empty).
In DITA 1.2 we are adding a set of generic grouping elements that can nest, allowing you to create nested (but untitled) containers within a topic body. One variant of this container allows section, so you can have semantic groupings of sections.
Posted by: | April 29, 2008 at 10:16 AM