Merging results from independent SPARQL queries @en

The title of this post should have been something like "My Really Cool Quicksilver/SPARQL Application", but...

Here's the idea: you enter <domain> <keyword> something like "projects calendar" or "blogs puddings" in a tool like Quicksilver, get back a HTML page describing projects that are calendar related or blog posts about puddings.

...there's plenty a slip twixt cup and lip. But I thought I might as well write up, maybe what I was trying could be of interest to anyone with (or without) a SPARQL endpoint and wanting to do something useful with it, or data like FOAF, DOAP, SIOC or whatever.

Background: so this morning I got fed up with Spotlight when I was trying to find some stuff, remembered Quicksilver which I've been meaning to look at for a while. While I was looking around the docs I saw a reference to a plugin script that would run some Python for you from Quicksilver, popping up the result in Growl. I promptly fired up emacs, and have spent the afternoon making up some Python, SPARQL and a bit of XSLT. Once I'd got what I working what I thought enough for proof-of-concept, I went back to get the Quicksilver script. Not there. Google couldn't help either. Plenty a slip, #2 - it looks like the SPARQL endpoint I was working against has a bug in its CONSTRUCT.

Anyhow, here's how it would work: <domain> would map to some particular ontology(s), <keyword> would be used in the FILTER part of SPARQL queries against a series of remote stores.

Because I wanted cross-site querying, I needed a way of distributing the query, then merging and presenting the results. For distributing the query, I opted for the easy one - just have a list somewhere of SPARQL endpoints that might have info related to the domains of interest. Ask each in turn. For merging and presenting the results, I decided to extend a trick I was playing with a while back (initially for an immuexa.com project - respect!), two-phase queries. Basically you do a domain-specific CONSTRUCT query which produces domain-independent, report-oriented RDF, which you then merge into a single model. That model you apply a generic report-oriented SELECT query, pushing the results through some boilerplate XSLT producing HTML (or whatever).

In the past I've only actually tried it using single-store/single-querying, where the benefit is that the presentation is generic, better separated than regular single-phase domain-specific SELECT queries. Er, and it's still unproven. But if you can get things into domain-independent, presentation-oriented RDF, why not merge it from different sources?

In the Python I did only set things up to work against one endpoint to get started, but it'd just be a matter of adding URIs for other endpoints, and extra SPARQL for other domains. You should get the idea from this snippet:

endpoints = {'doapstore': 'http://doapstore.org/sparql.php'}

sparql_templates = {'doap': 'doap-template.rq'}

result_sparql = 'report.rq'

result_xslt = 'report.xsl'

endpoints contains a dictionary of endpoint names/URIs. sparql_templates contains a dictionary of <domain>s (I put "doap" in there, but "projects" would be more user-friendly) and SPARQL templates.

The templating bit looks like this in the SPARQL :

FILTER regex(?description, "%KEYWORD%", "i")

- just uses Python :

query.replace("%KEYWORD%", keyword_string)

With the keyword filled in with the user string, the whole thing is a CONSTRUCT query. That's passed to each of the endpoints and the RDF results merged locally into an in-memory model (I used Redland). result_sparql (a SELECT) and result_xslt are the same for any domain, applied to the merged model. Here's a zip of the bits I got: slickquery.zip. It's dead slow, but mostly kind-of works. I've no plans to take it any further, this was just (another) impulse.

(Problem with doapstore's SPARQL - if it's not my end - is that everything gets rdf:nodeID="")

Oh yeah, Merry Christmas!

(Don't think that can be culturally sensitive coming from an atheist...)

@en

Danny Ayers
2006-12-24T17:43:24+01:00

Related
Comments
Edit