Microformats on the GRDDL

Got microformat data? Want it on the Semantic Web? All you need is a bit of one-off XSLT and a couple of tweaks to the XMDP profile, and every single document using that profile will transparently get a Semantic Web existence. No changes to the instance documents themselves needed. Nada, niente, nowt whatsoever.

Ok, so XHTML Meta Data Profiles ( XMDP) are (primarily) human-readable schemas, associated with microformat docs. Gleaning Resource Descriptions from Dialects of Languages ( GRDDL) is a way of extracting RDF from XML docs (using XSLT).

It's a few months since a note was added to the GRDDL spec about applying it to HTML profiles, such as those used by microformats. But what was lacking (IMHO) an example of an fully GRDDLable XMDP profile - until now (I nagged ;-). It's straightforward to implement, but there's some cunning in there. Now Dan Connolly has produced a demo XML Friends Network ( XFN) XMDP profile: http://www.w3.org/2003/g/td/xfn-workalike (here's the official XFN profile).

GRDDL-enhanced profiles will look and feel just the same as before, but there's a little extra in the <head> of the HTML. The XFN example satisfies the microformat specs, and the GRDDL stuff's all there, alongside the XMDP.

Ok, in practical terms the object of the exercise is to get RDF from a microformat doc. That will be done using XSLT - normally there'll be a specific XSLT associated with each microformat. The cool thing is that a GRDDL agent (say an aggregator or crawler) doesn't need any prior knowledge of the microformat in use or which XSLT it should use.

Here's how it all works:

First a GRDDL-capable client reads the XHTML microformat doc of interest. Within that doc the client finds a profile reference:

<head profile="http://gmpg.org/xfn/11">

Next the client will do a GET on the discovered URI to obtain the GRDDL profile. The page returned in this case (or rather Dan's workalike case) contains the following:



<head profile="http://www.w3.org/2003/g/data-view">



Yes, there's another profile URI. The page at that URI contains a reference to itself, it's the GRDDL profile of GRDDL, there's no need for the agent to get that page. The behaviour* specified when encountering that URI is to apply the XSLT transformations found in the microformat profile to obtain information about how to interpret the microformat instance document using that profile.

(* there doesn't necessarily have to be any procedural behaviour involved at all, this is the way the logical mapping is declared)

Within the body of the profile doc there's the following link:



<a href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl" rel="profileTransformation">grokXFN.xsl</a>



There's an XSLT style sheet for RDF extraction for profile docs (pointed to by the GRDDL data view profile), and when applied to the profile XHTML the link above will yield the triple:



<http://www.w3.org/2003/g/td/xfn-workalike> dataview:profileTransformation <http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl> .



The profileTransformation property relates an HTML Profile document to an Algorithm, usually encoded in XSLT, for extracting an RDF representation of (some of) the meaning of any XHTML document refering to this profile. In other words, the XSLT pointed to here is the one to use to get RDF from the microformat doc. So now the GRDDL agent will go back to that original doc, transform it and do what it likes with the data extracted.

It all sounds very convoluted, but the underlying idea is very elegant. Effectively recursion is used through the transform definitions to declaratively provide the RDF interpretation of the XHTML document in question.

In practice it's very straightforward to make a microformat GRDDL-friendly, and put all the documents that use the microformat on the Semantic Web. If you've followed the microformats.org's recommendations, you'll already have a XMDP profile for the format. You'll then need an XSLT stylesheet which can transform the docs into RDF/XML, but this isn't difficult, check grokXFN.xsl (this maps directly between relationship names and RDF property names) or hdoap2doap.xsl (the target structure is a bit more complicated, but it's still not rocket science). Already available are also dc-extract.xsl, GeoURL.xsl, home2rss.xsl. Then you just need to reference the GRDDL dataview profile in the head of the XMDP profile, and include a rel="profileTransformation" link pointing to your XSLT.

(Note that multiple XHTML profiles can apply to a document - the RDF interpretation is additive).

A GRDDL client needn't be too complicated either, it's just a follow-link/apply-XSLT loop. For perfomance reasons a GRDDL client is likely to have its behaviour on finding a specific profile URI cached somehow, so the long-winded version above is just what would happen first time around. If you're only interested in data from specific microformats, all you need is a hardcoded table between the profile URIs and the corresponding XSLT.

XFN Work-Alike (GRDDL test)

See also: GRDDL Data Views: Getting Started, Learning More, GRDDL specification updated works with Microformats

[Danny]

Danny Ayers
2005-08-01T10:29:25Z

Related
Comments
Edit