For each post/day/month (pick your granularity) create a corresponding RSS feed of weblog entries. These feeds are then referenced from an OPML file that defines the overall structure of the archived weblog.
This is exactly the kind of thing I had in mind when I was
talking about (what I call) the
model layer being a kind of emergent property of online
publishing. You wind up with all this data without any systematic
way of managing it. So you have to find one.
I'm sure RSS+OPML would be doable, but it seems like building on quicksand given the nature of those specs. The RSS part could certainly be productively flipped to Atom. I'm not sure whether there's anything spec'd up yet for the role of the OPML file - the Collections of the Atom Protocol should be pretty close.
Matt talks about extra metadata and the tools that understand it, saying namespacing is a way of including the stuff. The problem here is that there's no overall interpretation system beyond mustIgnore in the XML. Only tools that fully understand e.g. RSS+OPML+ENT would be able to do anything useful with all three. Â
Another alternative would be to use HTML. Done systematically there's potential for a neat export/archive system. Conventions are already available for the core of this kind of stuff with microformats - hAtom for entry/feed level data, XOXO for broader, site level structure etc.Â
I can think of three advantages right away of using microformats over RSS+OPML. The clearest being that it would be using proper specs, well-defined and none of that silent data loss nonsense. There's more flexibility because the representations can be freely mixed, it's all HTML. You can't mix RSS and OPML in the same document. The third is the big pragmatic gain that a level of partial understanding comes for free. The stuff is HTML, a browser can make a useful rendering of that out of the box.
Use of RSS+OPML, Atom and/or microformats all offer ways of representing the structured content, but all are relatively limited in regards to what you can do with the data, and extensibility into other forms of data.
The what it is of the title of this post is a reference to the fact that what Matt's describing is resource description, and though it's only being applied at the content model layer, it's starting to look like a reinvention of the Resource Description Framework. What's already available in that context is wide-open extensibility, significant partial understanding and loads of tools for working with the data - SPARQL querying across the archive being a nice low-hanger (Lee's got another post up re. the SPARQL Calendar Demo btw).
It would be unfortunate if a shared blog format was built without taking into consideration known problems of certain technologies, known advantages of others. But assuming all the available data does get captured unambiguously, it'll still be possible to treat the stuff as RDF whichever format(s) are used. So I'm not sure why I bothered writing this.