The GRDDL (Gleaning Resource Descriptions from Dialects of Languages) spec suite is coming on pretty nicely:
GRDDL (main spec)
- Use Cases editor's draft
The spec looks set to move from W3C Candidate Recommendation to Proposed Recommendation in the near future (along with Test Cases, if I remember correctly), the other docs becoming Notes.
This is all backed by running code, as in GRDDL test results. The implementations listed are Jena, Raptor and GRDDL.py. I believe OpenLink Virtuoso is also up to speed, but EARL results haven't yet been submitted (PS. on their way, apparently)
One of the apparently trickiest set of comments on public-grddl-comments came from Ryan King, in relation to microformats. This seems like a good opportunity to give my opinion (unofficial - the issue's closed).
So, as the GRDDL charter says:
The mission of this Working Group is to complement the concrete RDF/XML syntax with a mechanism to relate other XML syntaxes (especially XHTML dialects or "microformats") to the RDF abstract syntax via transformations identified by URIs.
Ryan says :
Given that the majority of the web is something other than "Valid XHTML", this spec doesn't seem to be very useful on the Web.
He also mentions GRDDL's need for a profile reference, and the practical issues related to that.
HarryH answered Ryan's points (to his satisfaction) by adding some informative text to the effect that you can always run Tidy or somesuch over anyHTML to get XHTML.
But taking a step back, microformats offer a set of conventions whereby publishers can embed data in their documents in a form that can be extracted in a consistent fashion. To use microformats, the publisher will have to modify their existing markup. Prior to microformats.org, one could have said " given that the majority of the web is something other than microformats...". I don't see a great deal of difference between publishers changing their markup to be GRDDL-friendly as changing it to be microformats-friendly (and ideally the changes will be exactly the same).
Personally I'd strongly recommend the use of profiles with microformats, and not just because of the GRDDL scenario. Without them, ok, there is the possibility for naming clashes (unlikely) but also there's the principle of string-squatting. Once microformats.org has claimed a class name, it's not available for use for any other purpose by other people (without the risk of misinterpretation by microformat tools). URIs are the identifiers on the web, they enable the distributed, independent development which has led to the success of the web. I'd suggest that turning microformats.org into another registrar of special strings runs counter to this.
Another point is that if HTTP URIs are used for the profiles then it is possible for a human or machine to follow their nose to the profile page and get more information. This presumably was the intent behind XMDP profiles, and is how GRDDL works ( one of the coolest bits of GRDDL is how this can happen recursively). Very webby, with dependencies on centralised authorities at a minimum.
Regarding the current proposal that HTML5 (or whatever it gets
called) drops the profile attribute, that seems seriously
wrong-minded, for the reasons given above. If anything the
attribute should be allowed on some of the block tags - maybe
<p> - so it'll be possible to state clearly that a given
microformat convention (or GRDDL transformation) applies to the
markup within the block. Google is not the only fruit.
To summarise, GRDDL in XHTML makes it possible for the publisher
to express data on the web in a form which can be (relatively)
unambiguously interpreted by any consumers. The fact that most
documents on the web don't currently follow the (
microformat, GRDDL...) conventions is fairly irrelevant.
There's nothing to stop scrapers scraping HTML and getting useful
data, but data arbitrarily scraped from the web will lack the
chain network of authority that mechanisms like
@profile (and GRDDL) provide. This is about adding utility (for
those that desire it) based on existing specifications, rather than
working around limitations of markup in the wild.