Got microformat data? Want it on the Semantic Web? All you need is a bit of one-off XSLT and a couple of tweaks to the XMDP profile, and every single document using that profile will transparently get a Semantic Web existence. No changes to the instance documents themselves needed. Nada, niente, nowt whatsoever.
Ok, so XHTML Meta Data Profiles ( XMDP) are (primarily) human-readable schemas, associated with microformat docs. Gleaning Resource Descriptions from Dialects of Languages ( GRDDL) is a way of extracting RDF from XML docs (using XSLT).
It's a few months since a note was added to the GRDDL spec about applying it to HTML profiles, such as those used by microformats. But what was lacking (IMHO) an example of an fully GRDDLable XMDP profile - until now (I nagged ;-). It's straightforward to implement, but there's some cunning in there. Now Dan Connolly has produced a demo XML Friends Network ( XFN) XMDP profile: http://www.w3.org/2003/g/td/xfn-workalike (here's the official XFN profile).
GRDDL-enhanced profiles will look and feel just the same as
before, but there's a little extra in the
<head> of the HTML. The XFN example satisfies
the microformat specs, and the GRDDL stuff's all there, alongside
the XMDP.
Ok, in practical terms the object of the exercise is to get RDF from a microformat doc. That will be done using XSLT - normally there'll be a specific XSLT associated with each microformat. The cool thing is that a GRDDL agent (say an aggregator or crawler) doesn't need any prior knowledge of the microformat in use or which XSLT it should use.
Here's how it all works:
First a GRDDL-capable client reads the XHTML microformat doc
of interest. Within that doc the client finds a profile
reference:
<head profile="http://gmpg.org/xfn/11">
Next the client will do a GET on the discovered URI to obtain the GRDDL profile. The page returned in this case (or rather Dan's workalike case) contains the following:
<head profile="http://www.w3.org/2003/g/data-view">
Yes, there's another profile URI. The page at that URI
contains a reference to itself, it's the GRDDL profile of GRDDL,
there's no need for the agent to get that page. The behaviour*
specified when encountering that URI is to apply the XSLT
transformations found in the microformat
profile to obtain information about how to interpret the
microformat instance
document using that profile.
(* there doesn't necessarily have to be any procedural
behaviour involved at all, this is the way the logical mapping is
declared)
Within the body of the profile doc there's the following link:
<a
href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl"
rel="profileTransformation">grokXFN.xsl</a>
There's an XSLT style sheet for RDF extraction for profile
docs (pointed to by the
GRDDL data view
profile), and when applied to the profile XHTML the link above
will yield the triple:
<http://www.w3.org/2003/g/td/xfn-workalike>
dataview:profileTransformation
<http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl> .
The
profileTransformation property
relates an HTML Profile document to an Algorithm, usually
encoded in XSLT, for extracting an RDF representation of (some of)
the meaning of any XHTML document refering to this profile. In
other words, the XSLT pointed to here is the one to use to get RDF
from the microformat doc. So now the GRDDL agent will go back to
that original doc, transform it and do what it likes with the data
extracted.
It all sounds very convoluted, but the underlying idea is very elegant. Effectively recursion is used through the transform definitions to declaratively provide the RDF interpretation of the XHTML document in question.
In practice it's very straightforward to make a microformat
GRDDL-friendly, and put all the documents that use the microformat
on the Semantic Web. If you've followed the
microformats.org's
recommendations, you'll already have a
XMDP profile for the format.
You'll then need an XSLT stylesheet which can transform the docs
into RDF/XML, but this isn't difficult, check
grokXFN.xsl
(this maps directly between relationship names and RDF property
names) or
hdoap2doap.xsl
(the target structure is a bit more complicated, but it's still not
rocket science). Already available are also
dc-extract.xsl,
GeoURL.xsl,
home2rss.xsl.
Then you just need to reference the GRDDL dataview profile in the
head of the XMDP profile, and include a
rel="profileTransformation" link pointing to your
XSLT.
(Note that multiple XHTML profiles can apply to a document -
the RDF interpretation is additive).
A GRDDL client needn't be too complicated either, it's just a follow-link/apply-XSLT loop. For perfomance reasons a GRDDL client is likely to have its behaviour on finding a specific profile URI cached somehow, so the long-winded version above is just what would happen first time around. If you're only interested in data from specific microformats, all you need is a hardcoded table between the profile URIs and the corresponding XSLT.
See also: GRDDL Data Views: Getting Started, Learning More, GRDDL specification updated works with Microformats
[Danny]