The big content system integration II

I've modified the 'big' diagram from the first post on this topic to show a circular content flow - now called editorial flow. 

Please find it here: Download the_system_ii.pdf

The diagram is still more conceptual than technical.  Of course at some point this thinking needs to be specialized for the particular publishing vertical, product needs, and company needs.

A few thoughts:

1. Shows content editing flowing in a circle.  Enter at any point and proceed downstream. That is, start  developing a print article or publication and complete it, then proceed to develop it into a web article or publication.  Or visa versa.

2. Prior to entering a print or web editorial workflow there is a content adding, packaging, editing phase, where it is assumed that a web interface will allow review of content sources and collection into the initial manuscript for the subsequent print or web editorial workflow.  This might, for example, allow enhancement of an article - with a new sidebar, for example, as it proceeds downstream. 

3. Implies content reuse if the circle keeps flowing.  The circle can also stop at any point if needs are met.  Some publishers might stop at having a print and web output (in any order), some might stop with either a print or web output, some might keep the cycle going indefinitely, building a large content repository over time (e.g. educational publishers).  The diagram also implies content maintained as XML rather than being imported and exported from editorial/workflow tools.   

4. It has a central repository built of two fundamental parts - XML and binary content (images, etc.).  Work done in page layout tools/editorial tools/workflow tools is transitory (though might be archived).  The purpose of the repository would be to accurately manage 'content' of published products and to also provide a starting point for initial manuscript creation for the next stage in the cycle.

5. Upon completion of the web or print cycle, a number of XML enabled exports are possible along with the main article/publication produced.   This is a requirement of some publishers, and certainly there for the taking, if content is accurately managed as XML.

Well, readers, what do you think?  Does it match your thinking?  Should we keep going with this?

O metadata, where art thou?

We've blogged and written about the capabilities of Adobe's InDesign. And as my colleagues have said before, we are very impressed with these tools and their increasing capabilities with XML, but still see some room for improvement. One area that really needs it is in metadata handling.

You can store metadata as XMP  (a.k.a. XML) in InDesign and InCopy, but for some reason the data is not included when exporting XML from either product. You do get it when you create a PDF or other binary object (it's still XMP), but you don't get it when you export plain ole XML. I understand that XMP is meant to hold metadata for binary objects, but let's not be too strict with that, please. XMP is the container Adobe gives you to store metadata (or properties) about the document you are working on in InDesign or InCopy. Really, that metadata should be available on export, whether you render the document as a PDF or export out as XML. I would like to think about it more in this way: the InDesign/InCopy document allows you to store metadata about it and that metadata should then be available as XMP in your binary exports and within the XML of your XML exports. That makes more sense to me. Metadata has value in all contexts, regardless of whether the file carrying it is XML or binary.

We've also worked with K4 from Softcare, a publishing management system that sits on top of InDesign and InCopy. In K4 we can also store metadata about the object, and K4 can be used to export the InCopy/InDesign XML along with the K4 metadata (also XML) but there is no real connection with the metadata (via XMP) stored with InDesign.  I guess you can pick one or store the metadata in both, because if you need it in your PDF, it needs to be in the XMP in InDesign and if you need it in your XML it needs to be in K4. Got that?  Additionally, we've found K4 to offer rather weak support for multi-level taxonomy classification. (But maybe that is ok if your writers write and your categorizers categorize.)

This is not a slam against Adobe's products or K4. Adobe has made great strides and we are seeing a lot of publishers leave Quark for Adobe products. But we do hope Adobe and Softcare (and other similar vendors working with Adobe products, like Woodwing) listen to users and people who need better synchronization and control of their metadata. Until then, we work around it.

XMP Open day in NYC

Had a really fun day yesterday talking about XMP with about 40 other people in NYC. Reminded me of the early days of SGML and then XML - some confusion, lots of excitement. Much better software though. If you have any need to capture metadata for non-XML assets, check out XMP Open or my article from 11/05 as a starting point. XMP is a very smart technology (briefly: RDF/XML metadata embedded inside assets - particularly binary assets - so that you can capture it at any point in a workflow, not only when working in a DAMS or CMS), and is also a very practical technology.

The day:

Andrew Salop - the guy who "invented" XMP at Adobe - and Dianne Kennedy of IDEAlliance split the hosting activities of the day. Andrew has his own consultancy now (http://www.metaseed.net/) and continues to be an XMP evangelist.

John Dougherty of Hachette Filipacchi gave the keynote. He has a terrific vision of how XMP can transform workflows and facilitate looser coupling of systems. "Connectivity by confluence not by brute force."

Chris Griffin of  Pound Hill gave an XMP tutorial. Pound Hill has bet their business on XMP. Check out their site for interesting software.

Gunar Penikis of Adobe gave an overview of XMP in Adobe products. He also ended up in the hot seat throughout the day, with the audience pushing for a clearer statement of why Adobe doesn't want to cede control of the core XMP standard. Was a friendly discussion - everyone is grateful to Adobe for the effort and investment they've already made - but general consensus seems to be that the community would be better off with the spec in the hands of a standards group, provided that there was an appropriate balance between keeping things moving and being careful in the evolution of the core spec.

A group of XMP users (publishers) spoke about what they want to happen next. Focus was on better software tools. Was a savvy group, but when discussing a disruptive technology like XMP, even smart people find it difficult to articulate exactly what they need; it's not just new software that's needed, but new ideas about workflow.

Those of us on the last panel talked about the spec itself - how to proceed? The end result was that IDEAlliance enthusiastically signed on to continue the XMP Open work of refining and promoting XMP (with continued control of the core spec by Adobe). There's a lot of interesting work to be done in this area in 2006. Possibilities identified include: choosing a schema language for XMP metadata sets, a suite of conformance use cases and content, continued development of more metadata sets for particular domains (for example, the PRISM group has made PRISM metadata XMP compatible), promotion of XMP through discussion of how metadata capture should be supported through typical workflows, and some refinements to the spec itself. I'll post additional information as we move forward. If you're interested in participating, contact Dianne Kennedy at IDEAlliance.

And if you're a software engineer with some time on your hands, consider developing software to help people read/write (especially write) XMP metadata without needing to code in C++. There's a new version (4.0) of the Adobe SDK on its way that adds support for more file types, but it's still C++ only. There is at least one other implementation out there that writes (Image::Exiftool), but it's limited in the file types supported and in the namespaces you can use. (Bob DuCharme mentioned this to me; Bob, tell me if I misquoted you.) Additional open source tools would do more to promote adoption than anything XMP Open can do...

XMP Open and XMP Open Day

IDEAlliance, an industry organization with a focus on standards and best practices for publishing and information technology, recently launched a new initiative, XMP Open and is hosting a one-day seminar, XMP Open Day, in March in New York City.

They state the mission of XMP open as:

The mission of XMP Open is to advance XMP as an open industry specification.  XMP promises to be the technology that will enable the seamless management of assets throughout an end-to-end digital supply chain.  Adobe Systems has given the industry a good starting point by developing the XMP specification far enough to support standardized metadata handling for its own Creative Suite products.  But to extend this killer concept across the broader digital asset supply chain, industry education and outreach along with critical new development must be supported.  The role of XMP Open will be to move the XMP specification forward to meet the requirements of the broader industry. 

We've mentioned the potential we see in the use of XMP for publishers in this blog and in our newsletter.   If you agree, this initiative and seminar are worth checking out.

Proprietary standards

Working on our newsletter gives me an excuse to think more broadly about topics than my day-to-day work might require, which is one of many reasons I'm happy to contribute. For example, this month's interview with Gasy Cosimini of Adobe reminded me that standards and software features are only important if their purpose is broadly understood and they're actually used.

Gary mentioned that they've had customers express a wish for XMP metadata in file types - say a word processing file, or an XML file - other than those originally conceived for XMP. I think this is a great idea in many situations, but it was already possible before, obviously so with XML but also for  everyone's favorite word processor, Word (look at Word's File Properties/Custom tab if you don't know what I mean). So why bother adding a new XML vocabulary (which XMP is) to XML files, or a way to capture metadata to Word (which already had one)? Answer: Because the more similar the metadata approach taken by all types of content in all types of environments, the more likely it will be used and used meaningfully. For these file types, XMP doesn't provide all that much that's new, but because people "get it", it might actually be used. And to quote Martha, that's a good thing.

This also reminds me that the term "standard" is used to mean lots of things. XMP is not, in fact, a standard in a formal sense. It was developed by a single company for (they hope) wide usage, but it is definitely under Adobe's control. Is this bad? Well, maybe in some ways. But, for the most part I think it's helpful, because Adobe put the energy (and time and money) behind their ideas, and the result is that more people are now interested in metadata and in sharing it across all sorts of boundaries - file type, organization, system, you name it. Purists argue about the technical merits of standards, but I generally am more concerned with the pragmatic questions - does this encourage usage and tool development? Yes? I'm all for it! (Ok, I like the purist discussions too, but I wouldn't subject a customer to a "pure" system that was hard to use or required custom development that could be avoided by using a less pure approach.)

BTW, XMP really is pretty cool. See my article this month for more information, and take a look at Mike Edson's for information about the XML features in InDesign and InCopy.

XMP notes from DAM summit

I attended the PRISM/DIM2 meeting and DAM Summit yesterday in New York City.  There was much good discussion between publishers and DAM vendors, and one of the more interesting topics was the growing use of XMP.

Bob Schaffel, Senior Product Manager for XMP at Adobe, offered two definitions.  The first is his technical description, “XMP is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. XML for metadata.”  The second is his human definition, borrowing from Marshall McLuhan, “The message is in the medium.”

One of the benefits of XMP is that the metadata is wrapped in the object itself.  Without it, if you transfer an object (say a jpg) from one system to another, you also need to determine a way to transfer metadata (implying this metadata also needs to be stored somewhere, such as a relational database or a separate XML file).  With XMP, the metadata is stored within the object itself and is transferred with the object from application to application (assuming the application supports XMP).  So if the systems that are transmitting the object can truly work with XMP, the metadata can travel with the object.  There is no need for additional querying for other data. 

For example, if you create an image in PhotoShop (including assigning some metadata), import it into InDesign and then create a PDF, the metadata travels with the image all the way through. Theoretically when you load the image into your DAM or other repository and then license it out, republish it, or deliver it somewhere, the XMP metadata can be delivered as well, as part of the object itself.

However, buyer beware!  Not all applications really work with XMP.  Some of the problems noted from the discussion at the meeting included:

  • Some applications will export the image but ditch the metadata.  For example, you edit your image in an application that can’t read or write XMP.  Some of these applications will output the image by recreating it, leaving out the XMP metadata.   So your output file “looks” the same (with the edits) but the metadata is lost. This reminds me of opening HTML files in Microsoft Word or FrontPage, which inserts additional tags into the file.  It's transparent when looking at the output, but underneath the covers, things are bad.


  • The file formats for images (and other objects) have a certain area for the binary data to create the image and then some other space for the XMP.  Some applications can read and write to the XMP but don’t understand the native file format enough to accommodate a growth in the area reserved for XMP.  That is, you could add so much additional metadata to the XMP area that it outgrows the space originally assigned to it.  The application needs to understand the file format enough to stretch the file to accommodate this.  There was some debate on how real this issue is (chances of adding so much that the area needs to be expanded) but it is a limitation to be aware of when evaluating applications that claim they handle XMP. 

  • Even if a current version of software supports XMP fully, older versions may not.  So if files are transmitted back and forth, say between staff editors and freelancers, be careful that someone in that loop doesn’t use an older version, which will most likely not return the XMP metadata to the file.

XMP has been around for a few years, but it is gaining more traction within the publishing community and vendors are moving to offer better support.  At least smart ones are.

Site Feed

About this Blog

This blog is produced by the consultants and analysts from Really Strategies, a content solutions and services provider.

A Content Management System for Publishers

Search This Blog

Lijit Search

Browse Archives

Browse a list of posts by author.