Whilst at the Jena conf.
and I ran through the
vocabulary, he'd previously mailed me some cleanup suggestions.
I initially did the vocab as a throwaway demo for an article, but a
few people picked up on it. The
version is cleaned up, but I've not yet had chance to check
that it (possibly alongside well-known vocabs) covers the whole of
microformat, which at this point in time is probably its largest
use case. (I can't remember why we opted for rev:createdOn rather
than dc:created, there was a reason...)
In a nearby galaxy, Leigh has been experimenting with Embedded RDF and GRDDL support in the SPARQL section of his XMLArmyKnife. Great stuff. PS. Leigh just posted to his personal blog, supplying a nice one liner: " So it's now possible to query microformats using SPARQL".
So here's the thing, this Atom feed is a pubsub subscription to blog posts that use the StructuredBlogging profile in their HTML (the entry URIs are in the <id> element). That means there's some embedded data following one or more of the SB-supported microformats (which include hReview). I'm not 100% sure of the current status, but last time I looked the SB folks were dealing with markup validity issues - the first post in that feed (a review of The Da Vinci Code flim, no less) is seriously invalid.
So ok, let's say there's a bit of code that can do aggregator-like polling of the Atom feed. Snag the post URI, push that through Tidy*, push the result through XSLT (I did hreview2rdfxml.xsl for the old schema, not had chance to update). Collect the whole lot as RDF, query with SPARQL, do a little Ajaxy front end and you've got the makings of a low-cost Web 2.0-ish site. Add a "submit your review" but, call it Revubo or some similar awful name (no offence ericP ;-), sell to Yahoo! (no offence dajobe ;-) make a mint...
On the topic of StructuredBlogging, I must point to Phil Pearson's latest hackery (because he said something nice about my SB XSLT :-). Not sure I get what's going on here yet, but it seem to be making XML-RPC more RESTful.
See also: micromodels.org
Oh yeah, I might as well drop this in here too - the other day I was trying to figure out how to do something similar with a completely different data source. Basically there's a remote soupy HTML page that changes every 15mins, I want to periodically grab and transform the data, bung it in a triplestore for SPARQL querying. Then it occurred to me it'd be nice to expose the data as RDF/XML directly, so someone else can come along and grab it from the web. But I don't want to be doing unnecessary calls to the remote site, nor do I want to have to do unnecessary processing on my server. So I was trying to figure out a way of doing smart caching. This rough pic is as far as I got. (Oops, bit big that, can't really be bothered firing up The Gimp so linked).
Not directly related, but I might as well drop it in. A problem came up in IRC yesterday, I'd be grateful for any suggestions. This blog/Knobot is running on port 80, but for convenience I've also got Apache2 running on port 88 for my old static files. Reto added the necessary for me to be able to redirect across to that port, only this doesn't help when 88 is firewall-blocked. I'm guessing there should be something around NAT to intercept/redirect so everything appears to the outside world to be on 80, but I've no idea where to start looking.@en