Me: So how would you do this: "Show me all the posts tagged âsunâ in the last month by the bloggers aggregated at Planet Intertwingy"?
Sam Ruby: //atom:category[@term='Sun']
The background is a couple of days ago Tim Bray proposed a URN scheme for tags, to act as a global namespace for the One Big Folksonomy. He was prompted while looking for a way of expressing that a Atom category was one of these tags. His post starts: "Tags and categories are pretty obviously the same thing".
In comments I slapped in a load of just use RDF, including the question above, but would have been better off waiting for Aristotle Pagaltzis's post on the question, in which he points out that Atom already has enough. By saying something is an atom:category/@term you are implicitly typing it as a tag (a conclusion I wound up at after a long diversion down the RDF path). Sam's solution demonstrates this can work.
While I think Tim's proposal has been sufficiently debunked, there's some potentially useful collateral.
Sam's solution also demonstrates a specific point that appears frequently when RDF is suggested to XML users. An direct XPath-based solution is considerably easier than say converting the data to the RDF model, then querying that with SPARQL. Tim expressed a common sentiment:
I'm just trying to hit an 80/20 point. We already have RDF for those who want to Solve The Whole Problem.
(Tim was addressing Bill deHora, who certainly isn't an 'RDF fanboy' ;-)
I don't think the story (for SWEO purposes) has really been sorted in this kind of case. This is intuitively self-evident when you're used to using RDF, but not easy to pin down in words. Snag is, it's visibly evident that in the case above, Sam's solution wins by a mile.
One approach is to bring up another problem, e.g. "Show me all the images tagged 'sun' at Flickr by the bloggers aggregated at Planet Intertwingly". Once you have those, show them alongside the results of the first query.
Ok, there's a big YAGNI aspect here, and I'm pretty sure an XPath-based solution could fairly easily be hacked to this problem as well. But data integration on the web (and elsewhere) offers a continuous stream of problems like this, across diverse domains, formats and services. In other words, the web of data is the Whole Problem. Solutions to individual problems may require infrastructure (libs!) that wouldn't be essential for solving the problems in isolation. But if a consistent framework is used then the aggregate cost can be minimised. (In fact the RDF Tax pays a dividend from the second problem onward). PS. Jeff Barr has some example problems that should be in scope.
I believe there should also be a strong historic analogy with the current web as a solution to the problems faced by people pre-web trying to integrate and distribute documents and media but I haven't thought this one through... Anyhow, would XPath have solved those problems?
~
A solution to the Flickr question above is in easier reach now (whether using SPARQL or XPath), thanks to Dave Beckett's latest offering:
Flickcurl is a library for calling the Flickr web service API, handling the API signing and the token management plus providing wrappers for some of the APIs. It uses libcurl to call the REST web service and libxml2 to manipulate the XML responses. The current version provides just a few wrappers for reading photo description, sufficient for me to get at the photo information and the machine tags.
It includes a utility (
triplr) to extract machine tag data
as RDF.