Here are my notes from XML2007 in Boston, Day 1. Since I'm responsible for the publishing track scheduling, I'm hanging out in here all conference.
******
Opening Plenary - Does XML Have a Future on the Web? The big takeaway from this panel was that developers in the real world are still being confronted with the fact that a lot of data can be modeled very effectively relationally and as objects, and using XML for such data imposes some unwelcome complexity, especially in terms of how to map the data to structures available in programming languages. JSON provides an easier way for such developers to work with content. On the other hand, some content doesn't fit well into this model, and so XML's complexity provides value well worth the cost. It was fun to see Michael Sperberg-McQueen and Douglas Crawford "discuss" this divide, but most audience members found value in both perspectives, and didn't seem to take seriously Crawford's notion that XML has been outright dangerous because in part of its being a distraction from the evolution of other web standards like HTML and javascript. The third panelist, Michael Day, offered some practical perspective from the viewpoint of a software provider that needs to wrestle with all the ways in which information might be published on the web. As he said, he saw no reason to privilege one format (HTML, XML, JSON) over another. He also made a comment about thinking CSS could potentially be used for sophisticated print formatting not only for the web. I'd like to hear more about that.
******
Eric Severson from Flatirons spoke about practical DITA lessons. (1) It's harder to model and actually develop content for re-use than you might think. Shouldn't be an all-or-nothing approach - re-use where is a lot of benefit, don't force-fit the re-use in other cases. Reconciliation also doesn't all need to happen at the time of DITA adoption. (2) How deal with approval process when no longer working with publications - working with topics? Create a map that is specifically designed to facilitate review - includes enough context for review but probably not the same as the publication. (3) Use specialization only when absolutely necessary - tools don't yet have much support for specialization. DITA committee is trying to address some inflexibility in generic task model that means people end up specializing when they might not want to. If need to specialize, specialize from the standard types as much as possible (task is exception because of other issue). Domain specilization is also an area where specilization is very often justified (for keyword typing, for example). (4) Use conref (enables re-use of content objects inside a topic) only in cases where want to create an index or list of things - for true re-use, can become very limiting to authors - hard to write text for so many contexts. Avoid nested topics for similar reasons. Maps and nested maps provide a better way to do this - impose less overhead on the topics themselves. (5) Dynamic content delivery - Important benefit of DITA is the metadata on topics/maps that indicate audience and other information. Rather than build a static publication on a topic, allow users to leverage that metadata in search.
******
Matt Turner from Mark Logic talked about Office Open XML (the XML underneath Office 2007). Quote: Office Open XML is cool because it's XML and you can mess with it. The new Office writes XML natively - no other format living in between the applications and XML. This is not the previous XML formats - this is new. Spec is huge and complicated and hard to read because of need for backwards compatibility and need for performance (one letter elements). Spec defines a zip package of a document's data (XML) and other items (images etc). The individual XML items in the zip are easy to interpret. The Office apps are now OOXML editors - and other (non-XML) applications could be OOXML editors also. Can bind a control in document (like for a form) to an XML instance inside the document or dynamically retreived. Have successfully generated OOXML from other data sources (showed demo with Shakespeare's plays). Demonstrated structural editing inside Word. The Office ribbon (replacement for toolbars/menus) can be configured (with XML) and customized to provide the kind of editing tools that are desired, including interaction with other server applications. I can't possibly describe Matt's mashup of tech docs and As you Like It, but it was both information and amusing. Thanks, Matt.
******
XML Authoring Tools panel with Justsystems (XMetaL), Adobe, Xopus, moderated by our own Mark Jacobson. XMetaL: Sweet spot is for direct creation of XML technical documentation. Vision of company is to be able to use other Justsystems product (xfy) to enable environments in which content from multiple schemas can be mashed together and where documents can include application logic. Adobe: Concept is to enable tools that address cross-media workflows whether that's simple XML docs using the XML tagging features in Creative Suite or whether it's focusing on layout and automating the generation of the XML later. Also want to get to re-purposing and to support for authoring based on rules or scenarios. For future thinking about schema language including RELAX NG. Xopus: Focused purely on XML editing by non-technical users in a web browser. Discussion: Mark pointed out two contexts for discussion - ubiquity of Word and expectations that brings, and the fact that people don't like to edit inside a structure.
******
Bob DuCharme - XHTML 2 for Publishers: New opportunities for storing interoperable content and metadata. 1.0 was about separation of design and content. 1.1 was about modularization rather than features. 2.0 goals: encode more semantics, more device independence, better forms (XForms), less scripting (XML Events). (Note, not a W3C recommendation yet.) The first two of these drive Bob's contention that XHTML 2 can now be used for publisher content - probably not as the primary source, but for more than just as browser format. Obvious example: interchange among organizations. Why consider it: DTDs can be really complicated - overwhelming and intimidating. HMTL is familiar and simpler. How does this work? STRUCTURE: HTML has a <section> element for grouping. The <h> element represents a heading regardless of level. So, have structure, can change levels without re-tagging. BETTER SEMANTICS/STRUCTURE: separator rather than hr. pre can be embedded inside p, which means can do things like present a single paragraph across multiple lines. lists can be embedded inside paragraphs so relationships are clearer. p as img - show image if available (or depending on device), otherwise show the paragraph text. Use of role attribute that points to namespace rather than class attribute (similar to DocBook). (Use of class for semantics can interfere with use of class for stylesheets, plus class is supposed to be nmtoken.) METADATA: Can use RDFa to embed your own metadata. The predicate and value go on elements that represent the subject (or that are contained in the subject object). So, maybe this:
<section><span property="dc:subject" content="recipe"/>...</section>
Can also do this:
<meta about="http://mynamespace" property="dc:subject" content="recipe"/>
Or even put an id on a content object (like a section) and point to it from meta tags.
But, as one attendee pointed out, HTML 5 is on a separate path than XHTML 2, and isn't at all clear that XHTML will get much support from browser vendors. Regardless, it appears to be a simple, familiar, but reasonably powerful way of sharing documents even if there is never any expectation of viewing them directly in web browsers.
******
Eric Clark of Time and Lee Vetten of McGraw Hill reviewed what's new in PRISM 2.0:
- Addition of elements to reflect more complicated workflow (sometimes web-first, sometimes print first) – original platform, web channel, killdate, postdate.
- Support schema as well as DTD
- Profiles: XML only profile, rdf/XML profile, also XMP profile now (especially for PDF archives)
- Updated controlled vocabularies
- Added aggregation type, genre, and presentation rather than just the previous “category”
- Added roles for creator and contributor
- Added a bunch of other elements, including some inline elements
- Eliminated PRISM 1.0 elements that were redundant to Dublin Core elements
Note: 2.0 is not backwards compatible.
Future work: Subcommittee around rights management (tracking and handling of digital assets). Creating a cookbook document that will help implementors understand how to support some standard use cases. Will roll out via webinar in January.
2.0 docs available now but not 100% complete. Final posting expected in December.
******
Jens Erlandsen spoke about a Swedish Dictionary project for the Swedish Academy (the ones that give the Nobel prize). Dictionary is modeled after the OED - massive scale. Been working on the first edition since 1898. Expect to complete in a few more years. 200 million characters, XML= ~600MB. Happy with manual workflow - working with slips of paper that can sort and look at more usefully than if digital. Dictionaries are special - average number of characters per element is about 7 for this dictionary. Dense, highly marked up, high quality content. Lots of element types - hundreds. The editorial rules are complex and unstable. And some rules can't be reflected in XML - homographs, sorting rules, etc. So how to build a schema? What to leave out? Jens' main theses/questions: (1) A schema can't be developed outside the context of how it will be used and what tools will be used. (2) Can one schema support all needs? No - different parts of the process need different schemas. (3) What is needed beyond schemas to capture all rules? Something - what? Jens covered the approach in detail. This was a great illustration of one of the points from the opening session dictionary content cannot be represented with name/value pairs. Jens also drove home that an authoring schema can't be designed without lots of experimentation to see what it's like to actually use them as an author/editor - eg to allow users to author flat content and add structure later, to re-organize entries, and so on. He mentioned that they used the iLEX tools for their project, which seem to be pretty cool.
******
Great day.
Comments