GRDDL without XSLT..?@en

The GRDDL Working Group has now resolved to request the spec goes forward to Last Call (discussions are ongoing about what happens to the supporting docs). Reaction to GRDDL around the web seems to have been unanimously positive, even in corners usually highly skeptical of Semantic Web technologies.

But yesterday there was a little pushback, from a slightly surprising quarter. Stefano Mazzocchi, in this mail to the SIMILE list gives robust criticism. Much of his mail is devoted to the general area of the liberal vs. Draconian parsing permathread. This is my fault for earlier using " parsing" as shorthand for " deterministic interpretation of publisher's intent according to standard specifications" (in constrast to " scraping" - using heuristics to try and determine that intent). Hey ho.

But there is a point he makes I believe should somehow be addressed. In the previous mail I'd quoted the GRDDL spec (working draft), adding a note of my own:

While technically Javascript, C, or virtually any other programming language may be used to express transformations for GRDDL, XSLT is specifically designed to express XML to XML transformations and has some good safety characteristics.

Also note there's nothing to stop an implementation seeing a transformation URI like "http://example.org/wiki2rdf.xsl" and using Javascript to do an equivalent transformation.

To which Stefano responded:

nothing? how about the fact that if I express GRDDL in XSLT I already

have an implicit "output" channel, while if I do it in javascript I

don't? should I embed a C compiler in my GRDDL-enabled crawler so that I

can recompile the code so that works on my platform?



That line that you quote above is *exactly* the kind of thing that makes

me kick and scream about some W3C recommendations for the patronizing

"this is left as an exercise to the reader" taste it leaves. There is a

difference between theory and practice, a win or lose one: in theory,

GRDDL can be described with a deck of punch cards that could be read by

an IBM mainframe in the 60's, but that's hardly useful if I have no way to:



1) get to the GRDDL description

2) obtain an executable representation of it

3) execute it



and most importantly



4) get the resulting data *out* of the program!



I'll stop consider GRDDL as just another way to apply XSLT to a web page

when the above four points are explicitly addressed, not before.

There's slight irony in " this is left as an exercise to the reader" because effectively that was the intent for this and quite a few other points in the spec. There isn't yet the implementation/deployment experience to be able to pin down all the formal requirements, things like this are open questions until the readers have exercised them a bit. It's certainly not meant to produce a patronizing taste, more an acknowledgement that " we don't yet know".

But despite the negative tone, Stefano has framed this particular question in a useful way, providing concrete requirements. I don't think there's a single answer, but when broken down into specific (testable) cases I don't see any reason his points can't be addressed.

For one thing, I think Stefano's spot on when he talks of the " GRDDL description". I can't be sure how the other members of the WG view this but to me GRDDL is a declarative thing, the emphasis being on what something (the mapping) is rather than the details of the procedures used to implement it. I'm only a casual LtU reader, but I think this can be considered in a way that's consistent with the fact that the GRDDL mechanisms are processes.

Ok, I have to confess I'm posting this really because I want to avoid spending the day noodling with code on it, sooo tempting. I know where I'd start. Forget the C case for now, just look at Javascript. So what should a GRDDL-aware agent do with this:

<foo xmlns="http:/example.org/ns#"


xmlns:grddl="http://www.w3.org/2003/g/data-view#"

grddl:transformation="http://purl.org/stuff/grddl/ foo2rdf.js">



// stuff

</foo>

A key problem which Stefano expands on a little in his mail is there's no standard, portable way of getting output from Javascript, in short:

Unlike XSLT, Javascript has no notion of "STDOUT"

But what XSLT is doing in GRDDL can be expressed like this:

GrddlResultRepresentation = GrddlTransformation(SourceRepresentation)

It's a function. Javascript has functions. ( I hope that's not too much like " purple is a fruit").

So perhaps the same functionality as XSLT offers could be expressed as:

function grddl(sourceURI) {

// source format to RDF format

return rdf;

}

Ok, at this point I'm fighting the urge to grab Rhino and start playing.

It looks like there are two immediate issues. First there's still that problem of getting the data out. I'm not familiar enough with Javascript to see an answer without play, but my optimistic intuition suggests there's a better chance of getting the return value of a function than the undefined output of a arbitrary script. Worst case, ok, Javascript might have no standard output port, so what are the minimum conventions needed to give it one? (Probably calls for a review of the ECMA specs).

The second problem is that in practice foo2rdf.js will have to contain a whole bunch of functions, so there needs to be a way of determining which particular function should be applied to the source. I suspect this calls for some kind of pragmatic convention. One might be simply to use the first function in the script, but that falls down if a compiler comes anywhere near. Another would be to use a named function - this should work (not sure whether you'd want it as a global or to use namespaces), but I think it would be better to avoid creating a special name if at all possible. A possible third approach would be to use the function signature - e.g. ensure there's only one function in the script which takes a single argument. One final possibility which rather appeals to me but which would probably call for a bunch of supporting specs is to identify the function in the GRDDL transform URI: http://purl.org/stuff/grddl/foo2rdf.js #grddl

At this point I'd better leave matters to the LazyWeb.

PS. A devious way around the question of what to do in the case of compiled languages might be to redirect to an online implementation, which would respond the RDF...

 

@en

Danny Ayers
2007-03-02T12:19:04+01:00

Related
Comments
Edit