"Journalism is the act. Newspapers are the artifact."

The past week was flush with conversation (ie, Tweets and blog comments) about Clay Shirky's rousing blog post Newspapers and Thinking the Unthinkable.

Read it. Read it in its entirety.

One comment on the post from Mark Bertils that won't leave my head is "Journalism is the act. Newspapers are the artifact."

I make a habit of looking up known definitions in the dictionary (and I do mean digital not the heavy tome I have here at my desk) because I find additional thought-provoking but commonly-known information. While reading the definitions of newspaper and newsprint I am reminded that the essence of the newspaper business is timeliness. Why were evening editions so popular in the 1950s and early 60s? For the same reasons that Twitter and RSS newsfeeds are so popular now. Timeliness.

I do lament the demise of print and I don't like seeing my Google news alerts on this topic competing with my spam folder in terms of volume. But I am heartened by the fact that journalism, reporting, writing, editing, authoring, these beautiful professions whose fundamental job is the exchange of thoughts from one mind to another, are not going away.

Achieving automation: InDesign/InCopy to XML

InDesign and InCopy are built for desktop publishing - giving great power to design and editorial.  This is all great news.  However, it makes exporting XML rather tricky - particularly the development of fully automated XML exports.  Sure you can capture XML coming out of these applications, but can you really push that XML into your CMS without having text processing look at it? 

We've looked at this over many projects and the key issue is, of course, the discipline required by each group in the process.  If they don't follow the rules, then their content might not match what your CMS is looking for.  A deck must be labeled as a deck somehow.  Likewise, a B-Head or run-in head must be labeled appropriately. There are also customer or genre specific structures and metadata that must be maintained - with paragraph or character styles (or one of several other techniques).

The point is that you can't look over everyone's shoulder.  Styling and other structure related errors are bound to creep into your content on occasion.   If you only want to accept well structured XML, then you need the capability to automatically identify errors and only ingest acceptable documents.

While you can create scripts to QC the content during production, this poses a scripting update problem every time you want to change your format structure (every time you do a redesign, perhaps).  And while scripting is extremely powerful in CS2 & CS3, it is pretty low level stuff and time consuming to produce anything complicated.  It is also problematic if you don't have a specialist on staff.  Better to write scripts once and move QC somewhere else.

So what to do?  One solution is a Schema (or DTD) validation technique that allows this QC operation to proceed during an automated export.  The Schema will be more restrictive than just looking at Adobe structures - it will overlay structures specific to your content.  And while updating a schema requires some technical knowhow, it is more straight forward and much faster than updating scripting of any kind.  The reason, of course, is that this is what Schemas are meant to do well.

Using a Schema to validate InDesign/InCopy content can detect a surprising number of human errors with styling and other structuring techniques.  Not all errors, but it can do a solid job if your content is moderately complex.  Content flows into an interim format and is validated before being transformed into its final form in your CMS.  This means that valid content can be fully automated from InDesign to the CMS.  Invalid documents can be automatically siphoned off for review and correction by production.  Users can then be retrained if necessary.

Beats checking every exported document ad nauseum, doesn't it?  Especially at 2am.

Call for Participation: DITA 2 InDesign Plug-In

Really Strategies is supporting the creation of an open-source, community-developed DITA-to-InDesign plug-in for the DITA Open Toolkit. We are donating a small amount of existing code (some early XML-to-InDesign transform experiments) and development effort over the weeks and months to come, as well our existing expertise and experience with both DITA processing and getting arbitrary XML into InDesign.

The project is managed on SourceForge as the DITA2InDesign project.

There's nothing much there now: we're just getting started with development and are actively soliciting contributions from others in the community. See the project's Web site for details on the project and how you can help us move it forward.

Our goal with this project is to help make it easier for Publishers, in particular, to take immediate advantage of DITA, or at least experiment with it with a minimum of up-front effort, by fostering the creation of a print production tool chain that uses tools both familiar to Publishers and capable of meeting Publishers' typographic and composition requirements.

With DITA today you can create printed output using the XSL-FO-based plug in. That plug-in is adequate for technical documents and, with a little effort, you can customize and extend it to reflect corporate branding and specific page layouts.

However, the inherent limitations in the XSL-FO standard and its available free and commercial implementations make it incapable of producing the more sophisticated layouts required by most commercial publications and more heavily-designed technical documents. Thus the need for something like the DITA2InDesign plug-in.

The goal is for the DITA2InDesign plug-in to help bridge the gap and make it as easy as possible to use InDesign with DITA-based content.

NOTE: While the plug-in will go long way toward automating the layout of DITA-based content with InDesign, it won't be able to do everything. There will always be a class of documents that require more automated layout sophistication than the plug-in could hope to provide. For those documents, the Typefi product offers a very attractive solution. Typefi provides very sophisticated automation features for rendering XML content into InDesign layouts. While one doesn't exist today, it should be fairly easy to create a generic DITA-to-Typefi "CXML" process that would allow you to use existing Typefi-based InDesign layouts with any DITA-based content.

Live DITA Application: FASB U.S. GAAP Codification

The work of all accountants doing commercial accounting in the U.S. is governed by the Generally Accepted Accounting Principles (GAAP), created and maintained by the Financial Accounting Standards Board, a member-supported organization mandated by the U.S. Congress.

Historically the GAAP has been created as a mishmash of different documents and supporting interpretation and commentary. There was no single organizing schema or source. In short, it was essentially impossible to determine whether or not you had found everything relevant to a given accounting issue.

To address this problem, the FASB decided to create a new all-encompassing classification taxonomy for the GAAP and codify all existing GAAP standards under this taxonomy. This project has been going on for over four years and has resulted in the Accounting Standards Codification, or ASC. The ASC content is currently undergoing an extended period of public review and is available through the FASB ASC Web site: http://asc.fasb.org/home.

While the ASC taxonomy itself was a major achievement, the codification activity was a daunting editorial process in which all the existing standards content had to be re-authored in a new form that directly reflects the taxonomy. To support this activity the FASB decided to use an XML-based system, which should come as no surprise.

But beyond that, the FASB realized several important things:

  • The GAAP content is highly modular
  • The GAAP content can be organized in many different useful ways depending on how it is being used:
    • By subject
    • By industry
    • By business process
    • By what's of immediate interest to a particular person researching a problem or set of problems.
  • The GAAP content requires rich metadata to enable accurate search and retrieval as well as binding to the new ASC taxonomy
  • Licensees of the content will want the XML source and will want to be able to use it with as little effort and expense as possible
  • The FASB does not have huge budgets for XML application development and implementation yet needs non-trivial systems for authoring and managing the GAAP content through its editorial processes as well as for delivery through the authoritative FASB Web site.

Given the foregoing, the FASB realized that a more traditional XML application, while possible, would not necessarily be optimal and would likely be prohibitively expensive and would not meet the requirements of licensees for ease-of-use of the XML content.

However, a DITA-based application would satisfy all these requirements. David Prather at FASB realized that the GAAP content could be modeled quite handily using DITA with some GAAP-specific specializations.

David worked out a clever way to use DITA maps to manage the organization and packaging of the codified GAAP content and hired me to design and implement the necessary GAAP-specific specializations (as well as do the data conversion from an initial XML format they had used for the initial codification editorial work). The FASB selected Ovitas to implement a new editorial support CMS system as well as the dynamic delivery system used to serve the ASC content through the FASB Web site.

The project went remarkably quickly--we had working DITA specializations defined and in place in a matter of weeks and the models required only minor refinement as the system implementation progressed, mostly stemming from new understandings of the underlying content as the codification editorial process approached completion. The CMS and Web site implementation went equally smoothly (remarkably so in my experience building such systems).

Because we could use the free DITA Open Toolkit to generate HTML sufficient for internal review of the codified content we didn't need to invest any time or money in acquiring or building rendering support just to support internal Q/A of the DITA content, a significant savings. Essentially, it allowed one part-time consultant, me, to do what would in the past have required a team of three or four consultants months of work to implement. By the same token, we were able to use the off-the-shelf DITA support in XML editors like Arbortext Editor and OxygenXML, removing the need to invest in document-type specific editor configurations and customizations, again saving weeks or months of consultant time. I think I spent about two days coming up to speed on how to configure Arbortext Editor to work with specialized DITA document types and about 1/2 day creating the necessary configurations (it's essentially a copy and modify process that I can now do in minutes).

Likewise, the Toolkit means that licensees can do *something* with the ASC content immediately, as well as giving them a solid base from which to develop whatever internal processes they need. Large publishers with existing XML infrastructure can of course apply that, but smaller publishers with little or no XML infrastructure can still take immediate advantage of the ASC XML source.

The ASC content is currently undergoing an extended period of public review and is available through the FASB ASC Web site: http://asc.fasb.org/home. The content is served dynamically from a slightly sanitized version of the DITA source--it is not static HTML pages generated from the DITA source.

The FASB ASC application is a working example of how the unique features of DITA XML applications significantly lower the cost of building this type of system while enabling significant value for the DITA-based content itself.

One interesting side effect of this system is that most, if not all, of the FASB's licensees, which include all the big name publishers and many smaller ones, will end up with both DITA-supporting internal systems as well as internal DITA expertise that can then be quickly and easily applied to any other DITA-based content, regardless of its markup details or subject domain. That seems pretty interesting to me....

Complex layout, XML and a little...Obama?

Complex layouts.  Full automation.  That is what many of us are working toward.  You know what I'm talking about - Layout to useful XML (and back again) with speed and very little increase in manual effort.  It's achievable.  But it means changes to production practices no matter what technology you use.  Change is coming.  And change is possible - if you don't believe me, go ask Barack Obama.  So if you're on the Obama bandwagon - and after Iowa, any candidate's bandwagon, red or blue - you must believe you can change production practices, even in the most intense and fragmented development operation.  Think about how hard it is to change the country - what's change in the publishing trenches compared to that?  I'm asking you to believe... :)

The architect and the interior designer

I heard a story the other day about a store owner who hired an interior designer to design her new retail store. With great enthusiasm they set about designing a great space. It was going to look great - and it did look great in the sketches. So she submitted their plan for approval.

What she didn't know and what she found out very quickly is that she was required by law to hire a registered architect to draw up the submission plans. So the designers sent their plans and drawings to an architect to 'just draw them up'.

Unfortunately, the architect found that there was no HVAC system in their design - oops. Bathroom design was not to code, as were several other things. Well, everyone scrambled, and after much brouhaha between the groups, as you can imagine, the design was revised and I suppose everything turned out fine (if more expensive). But it was touch and go for a while.

The lesson: work with an architect from the beginning, so you can create with the required constraints in mind.

Does this story sound like anything related to print-to-XML/web workflows and print content production in general? If so, then who is the interior designer? Who is the architect? And what is the HVAC system?

The big content system integration II

I've modified the 'big' diagram from the first post on this topic to show a circular content flow - now called editorial flow. 

Please find it here: Download the_system_ii.pdf

The diagram is still more conceptual than technical.  Of course at some point this thinking needs to be specialized for the particular publishing vertical, product needs, and company needs.

A few thoughts:

1. Shows content editing flowing in a circle.  Enter at any point and proceed downstream. That is, start  developing a print article or publication and complete it, then proceed to develop it into a web article or publication.  Or visa versa.

2. Prior to entering a print or web editorial workflow there is a content adding, packaging, editing phase, where it is assumed that a web interface will allow review of content sources and collection into the initial manuscript for the subsequent print or web editorial workflow.  This might, for example, allow enhancement of an article - with a new sidebar, for example, as it proceeds downstream. 

3. Implies content reuse if the circle keeps flowing.  The circle can also stop at any point if needs are met.  Some publishers might stop at having a print and web output (in any order), some might stop with either a print or web output, some might keep the cycle going indefinitely, building a large content repository over time (e.g. educational publishers).  The diagram also implies content maintained as XML rather than being imported and exported from editorial/workflow tools.   

4. It has a central repository built of two fundamental parts - XML and binary content (images, etc.).  Work done in page layout tools/editorial tools/workflow tools is transitory (though might be archived).  The purpose of the repository would be to accurately manage 'content' of published products and to also provide a starting point for initial manuscript creation for the next stage in the cycle.

5. Upon completion of the web or print cycle, a number of XML enabled exports are possible along with the main article/publication produced.   This is a requirement of some publishers, and certainly there for the taking, if content is accurately managed as XML.

Well, readers, what do you think?  Does it match your thinking?  Should we keep going with this?

What I want from Adobe - x-ray file formats

Consider The big content system integration diagram (draft 1).  What are the bottlenecks for the content stream?  Well if what I'm doing is any indicator, importing and exporting from InCopy or InDesign still has a little friction to it.  This is being addressed by Adobe progressively - see CS3 (yay!) for example over CS2.  Also being addressed by Softcare and other companies, again progressively.

But is there a fundamental change that can happen here?  Can we get XML that can pass through Adobe formats frictionlessly like Superman can see through walls with his x-ray vision?  (Digression here: There is a surprising amount of controversy on the Web about Superman's powers - many self proclaimed pundits seem to say that his X-Ray vision is unrealistic!  Not science that it is caused by low gravity on his home planet?  Baloney!)

Maybe the problem isn't that content can't yet be imported and exported seamlessly, maybe it is that content shouldn't be imported and exported at all.  If Adobe considered InDesign/InCopy not to be a holder of data, but an aggregator of data for print layout, then we might start getting somewhere.  Already, images are externally linked, why not, we might then ask, have text objects be externally linked?  I'm not talking about the page geometry, etc. that might live inside of an InCopy document.  I'm talking about just the text - and as XML if you please.

If Adobe therefore allowed XML content to remain external as files, and it allowed all external content, XML or images, to be linkable via HTTP protocol or otherwise, then we might have a situation where media and XML management systems could maintain content continually - without having to messily import during print production and export after print production.

Think of all the advantages!  Tremendous.  Metadata could be added externally, preparation for web could happen simultaneously. And Adobe page apps would still manage page layout and editing within page geometries so no skin off their nose.   Of course, this is not an easy thing to accomplish - but it is the logical future state - each format to its own system, with specialized apps consolidating, editing, and arranging the last mile of content.

Print and web: one ecosystem?

Ah, content, what are you doing to our beloved work culture?  We were happy when the world was simple!

Print production, so well known, so comfortable in its methods of achieving uncomfortable deadlines.  It was so simple when 'this is how we do things here' easily fell from our lips. 

Web production, so comfortable in it's freshness and technical knowledge.  Yet it still had a world all to its own.

But now, maybe the idea of 'content' is just starting to bring each world into the other's focus.  The definition of content is not really tied to any medium - it lives outside of the reality of 'deployment', of paper & ink, or web browsers. 

Clearly the 'star' theory of content (circa 1999) is long dead, where content is produced centrally and sent seamlessly to different mediums.  Content cannot live completely outside of its context.  It must be edited to each medium.  It can start, end, and be stored centrally, in something like an RSuite (XML CMS), but it changes as it moves to print, or moves to web.

With today's tools, many publishers are living in a world where content flows from manuscript-to-print-to-web.  And where the most efficient content development ecosystem will only be achieved if upstream (print) production and downstream (web) production work to accommodate each other.  Its one ecosystem after all, where the same 'content' travels through both of these environments.

A content hand-off at the the dividing line between these worlds was the first step in fitting these two systems together.  Loosening and blurring that border seems to be the next step as we forge ahead.   A single content ecosystem demands it.

A bit like making sausages

How do you make XML?  Well, as the old saying goes, it's a bit like making sausages.  You don't want to see them made, but they sure taste good.

By the way, a quick search on 'a bit like making sausages' reveals that almost everything is a bit like making sausages.

I was reminded of that saying when 'opening up the hood' on a pre-export InCopy file variant for a client the other day - that is, right after the final preparation script was run and right before triggering the XML-Exporter.  I think those in the room actually took a step back, with the look of someone seeing a meat grinder in action for the first time.

There was tagging all over the place - and in many different forms.  All of it was either automatically generated or semi-automated, with scripted support.  It looked awful!  Precisely.  That is because the final form was a machine readable document no longer built for editors or designers.

Well, we checked in the document to the XML export status in K4 and Presto! we had our lovely, tasty XML files - soon to be ready for consumption by a Web CMS.

Lovely.

Funny, but right now I cannot stop thinking about those wonderful sausages you buy in the country markets in France.  Truly you haven't lived until you have tried them.  (Apologies to other national sausage makers - I have not done the sausage travel circuit as much as I would like to.)

Site Feed

About this Blog

This blog is produced by the consultants and analysts from Really Strategies, a content solutions and services provider.

A Content Management System for Publishers

Search This Blog

Lijit Search

Browse Archives

Browse a list of posts by author.