Four and a half minutes that are worth it!
Four and a half minutes that are worth it!
Posted by annvmichael on July 29, 2008 at 03:55 PM in Content and Data Modeling, Content Delivery, Content Reuse, Education | Permalink | Comments (2)
Reblog (0) | |
InDesign and InCopy are built for desktop publishing - giving great power to design and editorial. This is all great news. However, it makes exporting XML rather tricky - particularly the development of fully automated XML exports. Sure you can capture XML coming out of these applications, but can you really push that XML into your CMS without having text processing look at it?
We've looked at this over many projects and the key issue is, of course, the discipline required by each group in the process. If they don't follow the rules, then their content might not match what your CMS is looking for. A deck must be labeled as a deck somehow. Likewise, a B-Head or run-in head must be labeled appropriately. There are also customer or genre specific structures and metadata that must be maintained - with paragraph or character styles (or one of several other techniques).
The point is that you can't look over everyone's shoulder. Styling and other structure related errors are bound to creep into your content on occasion. If you only want to accept well structured XML, then you need the capability to automatically identify errors and only ingest acceptable documents.
While you can create scripts to QC the content during production, this poses a scripting update problem every time you want to change your format structure (every time you do a redesign, perhaps). And while scripting is extremely powerful in CS2 & CS3, it is pretty low level stuff and time consuming to produce anything complicated. It is also problematic if you don't have a specialist on staff. Better to write scripts once and move QC somewhere else.
So what to do? One solution is a Schema (or DTD) validation technique that allows this QC operation to proceed during an automated export. The Schema will be more restrictive than just looking at Adobe structures - it will overlay structures specific to your content. And while updating a schema requires some technical knowhow, it is more straight forward and much faster than updating scripting of any kind. The reason, of course, is that this is what Schemas are meant to do well.
Using a Schema to validate InDesign/InCopy content can detect a surprising number of human errors with styling and other structuring techniques. Not all errors, but it can do a solid job if your content is moderately complex. Content flows into an interim format and is validated before being transformed into its final form in your CMS. This means that valid content can be fully automated from InDesign to the CMS. Invalid documents can be automatically siphoned off for review and correction by production. Users can then be retrained if necessary.
Beats checking every exported document ad nauseum, doesn't it? Especially at 2am.
Posted by Michael Edson on May 06, 2008 at 04:50 PM in Authoring Tools, Book Publishing, Composition, Content and Data Modeling, Education, Media, Workflows, XML | Permalink | Comments (0)
Reblog (0) | |
The work of all accountants doing commercial accounting in the U.S. is governed by the Generally Accepted Accounting Principles (GAAP), created and maintained by the Financial Accounting Standards Board, a member-supported organization mandated by the U.S. Congress.
Historically the GAAP has been created as a mishmash of different documents and supporting interpretation and commentary. There was no single organizing schema or source. In short, it was essentially impossible to determine whether or not you had found everything relevant to a given accounting issue.
To address this problem, the FASB decided to create a new all-encompassing classification taxonomy for the GAAP and codify all existing GAAP standards under this taxonomy. This project has been going on for over four years and has resulted in the Accounting Standards Codification, or ASC. The ASC content is currently undergoing an extended period of public review and is available through the FASB ASC Web site: http://asc.fasb.org/home.
While the ASC taxonomy itself was a major achievement, the codification activity was a daunting editorial process in which all the existing standards content had to be re-authored in a new form that directly reflects the taxonomy. To support this activity the FASB decided to use an XML-based system, which should come as no surprise.
But beyond that, the FASB realized several important things:
Given the foregoing, the FASB realized that a more traditional XML application, while possible, would not necessarily be optimal and would likely be prohibitively expensive and would not meet the requirements of licensees for ease-of-use of the XML content.
However, a DITA-based application would satisfy all these requirements. David Prather at FASB realized that the GAAP content could be modeled quite handily using DITA with some GAAP-specific specializations.
David worked out a clever way to use DITA maps to manage the organization and packaging of the codified GAAP content and hired me to design and implement the necessary GAAP-specific specializations (as well as do the data conversion from an initial XML format they had used for the initial codification editorial work). The FASB selected Ovitas to implement a new editorial support CMS system as well as the dynamic delivery system used to serve the ASC content through the FASB Web site.
The project went remarkably quickly--we had working DITA specializations defined and in place in a matter of weeks and the models required only minor refinement as the system implementation progressed, mostly stemming from new understandings of the underlying content as the codification editorial process approached completion. The CMS and Web site implementation went equally smoothly (remarkably so in my experience building such systems).
Because we could use the free DITA Open Toolkit to generate HTML sufficient for internal review of the codified content we didn't need to invest any time or money in acquiring or building rendering support just to support internal Q/A of the DITA content, a significant savings. Essentially, it allowed one part-time consultant, me, to do what would in the past have required a team of three or four consultants months of work to implement. By the same token, we were able to use the off-the-shelf DITA support in XML editors like Arbortext Editor and OxygenXML, removing the need to invest in document-type specific editor configurations and customizations, again saving weeks or months of consultant time. I think I spent about two days coming up to speed on how to configure Arbortext Editor to work with specialized DITA document types and about 1/2 day creating the necessary configurations (it's essentially a copy and modify process that I can now do in minutes).
Likewise, the Toolkit means that licensees can do *something* with the ASC content immediately, as well as giving them a solid base from which to develop whatever internal processes they need. Large publishers with existing XML infrastructure can of course apply that, but smaller publishers with little or no XML infrastructure can still take immediate advantage of the ASC XML source.
The ASC content is currently undergoing an extended period of public review and is available through the FASB ASC Web site: http://asc.fasb.org/home. The content is served dynamically from a slightly sanitized version of the DITA source--it is not static HTML pages generated from the DITA source.
The FASB ASC application is a working example of how the unique features of DITA XML applications significantly lower the cost of building this type of system while enabling significant value for the DITA-based content itself.
One interesting side effect of this system is that most, if not all, of the FASB's licensees, which include all the big name publishers and many smaller ones, will end up with both DITA-supporting internal systems as well as internal DITA expertise that can then be quickly and easily applied to any other DITA-based content, regardless of its markup details or subject domain. That seems pretty interesting to me....
Posted by Eliot Kimber on January 23, 2008 at 11:39 AM in Authoring Tools, Composition, Content and Data Modeling, Content Delivery, Content Management Systems, Content Reuse, DITA, Legal, Regulatory, & Legislative, Standards, Taxonomies, Web Publishing, XML | Permalink | Comments (1)
Reblog (0) | |
Eric Armstrong at Sun Microsystems has written an interesting blog post about the power of domain-specific languages (by which he means programming languages and data modeling languages), citing Ruby and DITA as two current examples: http://blogs.sun.com/coolstuff/entry/domain_specific_languages.
One of his main points is that part of the power of domain-specific languages is that they tend to foster a supporting infrastructure that makes using them significantly less expensive than more general solutions.
This is certainly true for DITA. If you've ever had the experience of designing and building a non-trivial XML application from scratch, you know how much easier it is to build on DITA, partly because of its inherent technical design, but also because of the amount of support infrastructure that is available that, more or less, just works.
Posted by Eliot Kimber on January 21, 2008 at 02:11 PM in Content and Data Modeling, DITA, Rich Data, Standards, XML | Permalink | Comments (0) | TrackBack (0)
Reblog (0) | |
Consider The big content system integration diagram (draft 1). What are the bottlenecks for the content stream? Well if what I'm doing is any indicator, importing and exporting from InCopy or InDesign still has a little friction to it. This is being addressed by Adobe progressively - see CS3 (yay!) for example over CS2. Also being addressed by Softcare and other companies, again progressively.
But is there a fundamental change that can happen here? Can we get XML that can pass through Adobe formats frictionlessly like Superman can see through walls with his x-ray vision? (Digression here: There is a surprising amount of controversy on the Web about Superman's powers - many self proclaimed pundits seem to say that his X-Ray vision is unrealistic! Not science that it is caused by low gravity on his home planet? Baloney!)
Maybe the problem isn't that content can't yet be imported and exported seamlessly, maybe it is that content shouldn't be imported and exported at all. If Adobe considered InDesign/InCopy not to be a holder of data, but an aggregator of data for print layout, then we might start getting somewhere. Already, images are externally linked, why not, we might then ask, have text objects be externally linked? I'm not talking about the page geometry, etc. that might live inside of an InCopy document. I'm talking about just the text - and as XML if you please.
If Adobe therefore allowed XML content to remain external as files, and it allowed all external content, XML or images, to be linkable via HTTP protocol or otherwise, then we might have a situation where media and XML management systems could maintain content continually - without having to messily import during print production and export after print production.
Think of all the advantages! Tremendous. Metadata could be added externally, preparation for web could happen simultaneously. And Adobe page apps would still manage page layout and editing within page geometries so no skin off their nose. Of course, this is not an easy thing to accomplish - but it is the logical future state - each format to its own system, with specialized apps consolidating, editing, and arranging the last mile of content.
Posted by Michael Edson on June 25, 2007 at 10:31 AM in Authoring Tools, Composition, Content and Data Modeling, Content Delivery, Content Management Systems, Magazines, Metadata, Newspapers, Production, Strategy, Web Publishing, XML | Permalink | Comments (1)
Reblog (0) | |
How do you make XML? Well, as the old saying goes, it's a bit like making sausages. You don't want to see them made, but they sure taste good.
By the way, a quick search on 'a bit like making sausages' reveals that almost everything is a bit like making sausages.
I was reminded of that saying when 'opening up the hood' on a pre-export InCopy file variant for a client the other day - that is, right after the final preparation script was run and right before triggering the XML-Exporter. I think those in the room actually took a step back, with the look of someone seeing a meat grinder in action for the first time.
There was tagging all over the place - and in many different forms. All of it was either automatically generated or semi-automated, with scripted support. It looked awful! Precisely. That is because the final form was a machine readable document no longer built for editors or designers.
Well, we checked in the document to the XML export status in K4 and Presto! we had our lovely, tasty XML files - soon to be ready for consumption by a Web CMS.
Lovely.
Funny, but right now I cannot stop thinking about those wonderful sausages you buy in the country markets in France. Truly you haven't lived until you have tried them. (Apologies to other national sausage makers - I have not done the sausage travel circuit as much as I would like to.)
Posted by Michael Edson on June 15, 2007 at 11:16 AM in Composition, Content and Data Modeling, Content Reuse, Education, Magazines, Newspapers, Production, Web Publishing, XML | Permalink | Comments (0)
Reblog (0) | |
Another summary I wrote for other purposes because I couldn't find it on the Web. This stuff is hard to write about. Feel free to correct me if you see something off (in fact, please do).
***
INTRODUCTION
Namespaces allow you to combine elements from multiple domains (i.e., namespaces) in the same XML instance. Even if the instances you encounter don't combine multiple namespaces, you still have to understand namespaces to work with XML that uses W3C XML Schema.
First, some terminology.
<MYNS:document xmlns:MYNS="http://www.mydomain.com">
All XML elements belong to a namespace domain. Often you can guess what this is by looking at the element's prefix. Elements without a prefix are part of the default namespace for the instance. You can't guess what the default namespace for an instance is - you have to look for the default namespace declaration. If there is no default namespace declaration, then there is a namespace - it's just null. (Null is not the same thing as not existing.)
When you set the the default namespace, it applies to the current element and all its descendents. However, you can switch the default namespace on a descendent node by resetting the xmlns attribute to a new value. I've never seen this done in the real world.
If you aren't aware that the namespace can be defaulted, writing namespace-aware software (all schema-aware software is namespace-aware) causes confusing problems. For example, if an instance has the default namespace of "http://mydomain.com", then you need to make sure your code's references to its elements also specify that namespace. In XSLT, this means assigning a namespace prefix to the domain, and referencing the element with the prefix included. So, your code might reference <MYNS:document> even though the instance contains <document>.
The same prefix could be used in two different instances but for two different domains. Again, this can cause issues for software processing if you're not careful. If two different instances contain <MYNS:document> elements, but in one instance MYNS is mapped to http://www.mydomain.com and in the other to http://www.yourdomain.com, then those elements are NOT the same element, and software must be written accordingly. In XSLT, for example, you would declare both the namespaces, but use two different prefixes to reference the domains used in the two instances. So, your software might reference <YOURNS:document> in order to manipulate elements that look like <MYNS:document> in one of the instances.
DECLARING NAMESPACES
Namespaces are declared through attributes with the xmlns prefix. (Note that W3C XML Schemas and XSLT scripts are XML instances, so this applies to them as well.)
Most of the time, you will encounter namespace declarations like this:
<MYNS:document xmlns:MYNS="http://www.mydomain.com">
<MYNS:para>Some text</MYNS:para>
</MYNS:document>
The prefix you want to use is given the xmlns prefix and assigned to the namespace domain. The namespace declaration is included in the instance either on the current element or one of its ancestors.
To set the namespace default, you do the same thing, but the prefix you are assigning is the null one (xmlns is a prefix to nothing!):
<document xmlns="http://www.mydomain.com">
<para>Some text</para>
</document>
In this instance, any element without a prefix is part of the http://www.mydomain.com namespace.
A namespace prefix only needs to be declared once in an instance that uses it. An instance is not well-formed if a namespace prefix is used with no declaration, or if the declaration is not on the current or an ancestral element.
ATTRIBUTES AND NAMESPACES
Attributes can have a namespace prefix just like elements can. But, attributes don’t follow the same rules as elements regarding namespace declarations. If you set a default namespace for an instance, you must still explicitly declare the prefix for the instance's attributes. For example, in this:
<myElement xmlns="http://www.mydomain.com" myAttribute="1234" /> myElement is part of the mydomain namespace, but myAttribute has the default (null) namespace. To make myAttribute part of mydomain, you would need to do this instead:
<myElement xmlns="http://www.mydomain.com"
xmlns:MYNS="http://www.mydomain.com"
MYNS:myAttribute="1234" />
This makes sense if you think it through (couldn't really work any other way).
REFERENCING SCHEMAS
There are two ways to reference schemas. Both use an attribute to point to the schema location. The attributes belongs to a reserved w3c namespace domain, which is almost always referenced with the "xsi" prefix.
1. THE xsi:schemaLocation ATTRIBUTE
Schemas can (optionally) include what is called a target namespace. It is declared using the targetNamespace attribute on the root <schema> element. This is the namespace for elements defined in the schema. If an XML instance references a schema with a declared target namespace, then its elements must belong to the same namespace.
When a target namespace is declared in the schema, instances must use the schemaLocation attribute to reference the schema, and the default namespace for the instance must be set to the schema target namespace.
<document xmlns='http://www.mydomain.com'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://www.domain.com/schema.xsd'> ...
</document>
The default namespace declaration (the xmlns attribute) points to the schema target namespace.
The schemaLocation attribute points to the schema location.
2. THE noNamespaceSchemaLocation ATTRIBUTE
If a schema doesn't include a declared target namespace, then it belongs to the null namespace. Instances using the schema must also keep the default namespace as as the null namespace, and should use the noNamespaceSchemaLocation attribute to point to the schema.
<document xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='http://www.domain.com/schema.xsd'> ...
</document>
No default namespace is declared (and so it is null), and the noNamespaceSchemaLocation attribute points to the schema location.
Posted by Lisa W. Bos on March 01, 2007 at 10:18 PM in Content and Data Modeling, Standards, XML | Permalink | Comments (0) | TrackBack (0)
Reblog (0) | |
The question is not new and sometimes it surprises me how often it comes up, but we've had some recent discussions with a handful of publishers on making decisions on "how much markup is right."
The conversation reminded me of a recent issue a publisher was facing. This publisher has a weekly publication that contains bibliographic information, both for the sources used to create the report as well as pointers to other relevant material. Back in 2000, when the DTD was developed, it was decided to capture detail in the markup—last name, publisher, title, date, etc.—all your basic bibliographic information. There was no immediate use, besides presenting the content on web pages, but it seemed to make good sense. For the last 6 years, the workflow has been based on Quark styles and the in-house compositor needs to style each little piece of the bibliography. Well this may be fine if this level of granularity is used, but it never has been. It sounded good back in 2000, but the fact is, there just has not been a business need to make use of this granularity. The publisher is now rethinking this level of markup and whether it makes sense to continue tagging it that way.
Of course, for some the inverse is true. Another publisher we know has XML content from a partner who has converted their content over the years (they didn't keep it themselves). But they are seeing that there may not be enough detail in the markup to do the things they want to do with it. They are exploring strategies for enhancing the content.
So, how do you find the sweet spot? How much markup is right? Like many things the answer is the common, but much despised "it depends." But here are some questions and issues to consider:
These are a few issues that can help in deciding how much markup is right.
Posted by Ed Stevenson on January 18, 2007 at 02:37 PM in Content and Data Modeling | Permalink | Comments (2) | TrackBack (0)
Reblog (0) | |
Jon Udell started an excellent podcast series, with a new podcast every Friday. Last week, he talked with Bob Glushko, co-author of Document Engineering, who I interviewed last March for this blog (parts 1, 2, and 3).
I've found Jon Udell's podcast series to be very informative, easy to listen to, and occasionally covering topics relevant to the publishing industry (and on the other occassions, just good topics for those involved with technology). Besides last Friday's segment with Bob Glushko, I'd especially recommend listening to the conversations with CJ Rayhill on Safari U, Chris Gemignani about data analytics, and Lou Rosenfeld on information architecture.
Posted by Ed Stevenson on July 24, 2006 at 04:21 PM in Content and Data Modeling | Permalink | Comments (0) | TrackBack (0)
Reblog (0) | |
This is the final segment of my interview with Bob Glushko. Read parts 1 and 2.
Ed: At what point in the modeling exercises do you think it is appropriate to start thinking about the technical implementation of the content?
Bob: As much as possible you should focus on conceptual analysis and modeling and postpone implementation considerations. That’s becoming easier to do for publishing because of the improvements in the standards and technology for separating content from presentation. For purely transactional document types designed to be produced and processed only by computers, it is also becoming straightforward to create implementation-independent models and use them to generate code or configure an application.
As before, the messy stuff is where there is a mixture of people and processing, and we often have to be sensitive to implementation constraints in user interfaces when we create transactional models that people also interact with, as in forms and workflow applications. There is research underway to use document and process models in UI design patterns and model-based applications where implementation considerations are deferred until the very end, but it isn’t quite ready for prime time.
Ed: In your book you say that document engineering is a balance between anthropology and archaeology. Can you explain that a little?
Bob: You have to balance the complementary perspectives of the archaeologist and the anthropologist as you discover and interpret information sources. The archaeologist struggles to interpret document artifacts, legacy data sources or forms and their associated business processes that were created by organizations or people who are no longer there to help. The anthropologist studies people and phenomena in their natural surroundings with an open, nonjudgmental mind. The anthropologist’s perspective frees us from assuming that the methods or strategies that people use for organizing and storing documents are entirely rational, because they certainly aren’t (try analyzing how you manage the documents in your office).
I was in the archeologist’s mode when I noticed lots of coffee cup rings and grease marks on some repair manuals on the workbench of the auto mechanic working on my car. I thought the artifacts were telling me that the mechanics followed the factory-designed repair procedures. But when I mentioned my inference to the mechanic he laughed and said “we make them look used like that so that when they check up on us they think we follow their procedures. Most of the stuff in the books is wrong, or we’ve figured out a better way.”
So we can use interviews or questionnaires to find out how people think they use documents and information, but sometimes they tell us the conventional wisdom, their organizational policy, or whatever they think we want to hear. But even when they think they are telling us the truth, they may be wrong. That's when the archaeologist's perspective takes over and lets the information artifacts speak for themselves.
Posted by Ed Stevenson on March 31, 2006 at 06:00 AM in Content and Data Modeling | Permalink | Comments (0) | TrackBack (0)
Reblog (0) | |
Our Company
Recent Comments