« Management to IT: "We don’t like you either" | Main | What is a Project Manager? »

Real metadata

I'm adapting a detailed presentation on metadata for a webinar on metadata next week, and thought it might be helpful to post some excerpts. Most CMS products are weak in managing "real" metadata, and impose limitations on a publisher's product development. Until you understand metadata, it's hard to understand why this matters so much.

A typical CMS product stores metadata as name/value pairs in a relational database. Most metadata doesn't want to live like that. If you've invested in XML for your documents, you should also invest in it for your metadata. Here are a few reasons:

1. Most metadata is more naturally modeled as XML than as simple tables.
2. Some metadata lives most meaningfully in a content context (inside the document)
3. Metadata markup should conform to document markup

This illustrates one of the greatest selling points of a CMS that uses XML as its native format - your metadata can be any XML you want it to be. (Probably not a surprise that RSuite is one of these!)

Read on for more detail.

1. Most metadata is more naturally modeled as XML than as simple tables.

Metadata has
- Internal markup
- Hierarchy
- The dreaded mixed content

Simple tables can't or make it really hard.

For example, journal article contributors often:
- Come in groups
- Are typed (author, editor, …)
- Are ordered
- Have internal structure (first, last, etc)
- Include spacing and punctuation around internal elements
- Are related to/contain affiliation information

2. Some metadata lives most meaningfully in a content context

Example: headings
- Are naturally authored while authoring the document itself (news headlines can be an exception)
- Can be for document sections
- Can contain footnote references to document locations
- Are presented as part of the document for editorial review or to readers

3. Metadata markup should conform to document markup

Example: Journal article abstracts contain formatting and other inline elements, paragraphs, and even lists. It makes sense that the tags used for the rest of the article also be used in the abstract. This can be awkward to impossible depending on the approach taken with a relational model.

These are three modeling reasons to use an XML-aware CMS for metadata. There are other kinds of reasons also. For example, you might not know that a field qualifies as metadata until well after your CMS is deployed - when you are trying to create a new product. Finding this out "late" will cause you pain (time and money) in most environments. You'll be in a much better position if your CMS allows you to treat any content in your XML document as metadata. This same example - not knowing about usage until after CMS deployment - is also a good example of why even a custom relational CMS (more than name/value pairs) still limits publishers too much.

Comments

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In

Site Feed

About this Blog

This blog is produced by the consultants and analysts from Really Strategies, a content solutions and services provider.

A Content Management System for Publishers

Search This Blog

Lijit Search

Browse Archives

Browse a list of posts by author.