Paths@en

I just found out there was a Microformats and Structured Blogging BoF at Mix06. The getting things right bit starts like this: pretty much everyone who's looked at the Web with a critical eye has noticed that it's markedly low on data that can be read predictably, deterministically by software. But the potential benefits are increasingly visible with Web 2.0 mashups based on exposed services/APIs - these are about exchanging and processing data on the Web. Web as Platform, Web of Data, lahvely.



What various smarty-pants individuals have noticed is that there's low-hanging fruit in these parts. Every single site on the Web is already exposing an API. You get the pages by doing a GET over the HTTP protocol. What you see in the browser is human-readable info. But a considerable proportion of these sites are actually delivering machine-readable data, only it's getting obfuscated in the publishing process. The stuff from the MySQL DB behind the scenes of the blog, railway timetable, photo collection is fluffed out into HTML to make it human-readable, but along the way loses any direct hooks into the raw data.



Enter Microformats (µFs) and Structured Blogging (SB). The basic idea is simple - keep publishing that human-readable content, just make it possible to read the data back out of the HTML. The two initiatives are addressing different aspects of the problem. uFs are mostly about getting the formats right. This involves minimal invention, just developing conventions based on existing specifications and generally recognised good practice. SB is mostly about adapting existing tools (WordPress, Drupal etc) to allow the content author, with minimal effort, to publish material that works not only as human-readable content but machine-readable data. SB's approach is two-pronged, they're using µFs plus embedded XML.



What these initiatives have in common is that they ask little of their target audiences: the developer of publishing tools will already be using HTML/CSS, microformat support only means minor (additive) changes to their templates. The blogger who e.g. reviews books will only be entering the same set of information in an SB tool as they would in a regular blogging tool. The initiatives ask little; but they give more back. Because they're conventions based on best practices, any user of µFs can be reasonably confident that they're following best practices. Ok, things like standards-compliance, accessibility and so on are low on the list of priorities for many web publishers. They want something more tangible. Remember Zen Garden? One set of XHTML conventions, numerous (many stunning) pluggable CSS skins. Microformats offer the potential for Zen Garden for blogs. What the SB tool user gets is - well, a tool designed for creating the kind of content they want to create and displaying it in a nice fashion.



There are plenty of skeptics of µFs, Norm's analysis is a good one. similar points have been made by Uche - if you want data in a machine processable XML format (like microformats), then rather than force it into XHTML, just use a straight XML format. If I remember correctly, in past discussions Shelley has expressed the opinion that you might as well just expose RDF/XML as embedding data in the HTML (but she has also worked on the SB systems).



My opinion in short is that the µFs and SB initiatives are very positive. More data on the Web? Bring it on. This is a huge plus for us SemWeb types. The fact that these groups are working together rather than having format food-fights makes a good role model.



Here's an interesting thing: following a link from Alex Barnett's coverage, there's the Maimed Leech (!) talking about Talking about microformats and WinFS. The Leech asked the following questions at the BoF, got answers from Tantek:
1. How do you get people to agree on the schemas?
Derive them from current practice on the web rather then inventing them from scratch (paraphrased).
2. How do you handle the need for both a rich schema for "high end" applications and a simple schema for "low end" applications?
Both will exist (paraphrased).

Bravo Tantek. The second is absolutely consistent with the Semantic Web approach, a slight refactoring of the first gives the the corresponding SemWeb one there: where possible derive your schemas/ontologies from current practice rather then inventing them from scratch (but it's not the end of the world if you make everything up, you can stitch vocabularies together afterwards). This includes modelling that is currently off-web (I doubt anyone's done the many-thousand term models from biochemistry as straight HTML).

Enron Wright picks up on this with reference to schema modularisation in this context. Indeed, when expressing data in XHTML modular profiles work. But for the model behind the formats: RDF, RDF schema (and OWL) works. The similarity between WinFS and the RDF model has come up before. These posts have me laughing out loud - in an entirely positive, life is sweet way.



Analogy time.



The pilgrims know the right general direction to their goal. But to get there, they have to find their way through a dense wood. Some paths have been cleared. The pilgrim will always follow a path in the right general direction as the goal. But from any given place, it's not that obvious which are the good paths and which are paths made by the wild animals of the wood which lead into impassible brambles. These paths can be paved over, but it's not quite a cowpath thing. Because everyone's heading in the same general direction, it's often possible to rejoin a clearer route, worst case it's possible to backtrack a little to get on a better path.



The goal of the pilgrims is the future Web in all its data-enhanced glory. RSS, even OPML are paths heading in the right general direction. But the Big Dog formats and things like WinFS are off the clearest paths - the former lack the solidity of decent specifications, but there's a clear path nearby with Atom and microformats. Near WinFS the clear path is RDF - the same basic model but with hooks to the current Web (URIs). So it's nice to have standards-bearing initiatives like microformats and Structured Blogging to highlight the clear paths. Even people that have been late to the party in the past are noticing. Even Bill Gates has noticed.



I've been getting some really good comments on the blog recently, I must find a way of making them more visible. Here's a part of one from Richard Cyganiak which is apposite:

The [SemWeb] tools are still far from a state where they are usable for "non-believers." I still can't download MyRDF anywhere. Sure, just let the tools mature for another few years ... but till then, something will have grown out of the OPML/RSS/JSON/RoR frenzy, something that crudely solves 80% of the SemWeb problems with 20% of the cost. We'll scream "But RDF did this ten years ago!" and will be ignored, and we'll retool once more. Personally I'd rather work with RDF Lite than with some OPML/Javascript mutant.

As I seem to be saying a lot these days, I don't really disagree. At least about the observations. But the prognostication seems to assume an isolated, passive RDF community. If the LazyWeb doesn't deliver kit like MyRDF soon (which sounds in a similar space as Semantic Web in a Box - Ontogon is getting close there) I'll build the bloody things myself. From the existing tools this will only need a little GUI work and a bit of packaging.



Things are growing out of the OPML/RSS/JSON/RoR frenzy, many of which conceptually (if not in implementation) could be viewed as RDF Lite. I'll nitpick here a second - there's a qualitative difference. JSON and RoR are potentially very useful with RDF Full (See e.g. Kendall's Semantic Rails post, Elias calls SPARQL+JSON a "killer combination"). There are things still to figure out around using these technologies with SemWeb tech, but there's no fundamental mismatch. There is something of a mismatch with RSS and OPML in that they're littered with ambiguities.



Yep, it can be frustrating when folks like Adam Green favour tech like OPML over e.g. XHTML and/or RDF. The biggest discomfort is that this approach will show immediate gains. From a standing start it's usually necessary to get a little speed up before you can really enjoy the advantages of RDF. With the "simple", naive quasi-XML formats it's quicker to get results. But however we may argue amongst ourselves, the primary reason people like myself, Richard and Shelley would generally favour tech like RDF (and likely XHTML and Atom) over less capable technologies is because the latter are (*dramatic pause*) less capable. They may get some immediate results, but that doesn't make the problems go away.

During the "another few years" that Richard suggests the folks building on naive quasi-XML will be devoting their time to hacking through the undergrowth: working around spec shortcomings, findings answers to questions, many of which have long since been figured out around RDF (and better-practice XML). They won't be able to solve 80% of the SemWeb problems by magic - it's taken a good few years work around RDF.



Yep, it's taken a long time. But here's a thing - RDF tools are rapidly approaching the stage where immediate results can be as little effort as with naive quasi-XML. Hare meet Turtle. He's like the tortoise, but Turtle has super-turbo-powers.



To date the RDF community has been relatively isolated - mostly because the benefits, even the technical goals haven't been obvious. But the Web 2.0 thing has opened a lot of eyes to the potential of data on the Web, and community-wise there are now whole scientific disciplines turning to SemWeb tech to help solve problems that have proven difficult by other means. With all due respect to developers of content management tools, and certainly without wishing to devalue it, there's more to life than the blogosphere. Software can help across virtually every aspect of civilization: science, the arts, journalism, developer diaries, cat photos. Long term I've no doubt an integrated system will emerge. The shortest path right now is the Semantic Web.



Speaking of paths, sometime soon I must have another look at the FOAF co-depiction stuff. Alex has a (not very flattering) photo of Marc and Tantek. I've yet to meet Marc, and I don't think there's a photo anywhere featuring Tantek and I. But I do have a rather cool co-depiction path to them both through this photo of Rohit and I (not very flattering for me, but I bet that's the case for most people pictured alongside Rohit ;-) and onto this microformats panel.

@en

Danny Ayers
2006-03-24T00:06:44+01:00

Related
Comments
Edit