Hash soup@en

An essay from Xiaoshu Wang called URI Identity and Web Architecture Revisited, has prompted Ian Davis to discuss Fragmentation. I've only skimmed the essay so far, but want to get some comments down right away - if past experience is anything to go by I'll be more confused later.

Ian indicates the nub of the problem, it's one of timbl's axioms :

The significance of the fragment identifier is a function of the MIME type of the object

This does mess up orthogonality [given the URI vs URIref point below this may not strictly be true - but the net effect remains the same], but I don't think it's a web-breaking issue because the flexibility in what constitutes a representation allows a lot of wiggle room.

Take a URI like http://example.org/people#joe - it identifies a resource, let's say that resource is the real-world person Joe. We know how to do stuff with things like this in RDF - it's a URIref/IRI which allow us to say things about Joe. We also know how to do stuff with things like this in HTML - joe would likely be a named anchor on the people page.

Getting the HTML

So let's say we have simple conneg set up on the server, and do a GET on http://example.org/people#joe - we'd get back something we might locally refer to as people.html.

[[

PS. Ed Davies pointed out that you can't do a GET on http://example.org/people#joe - the fragment isn't part of the URI. Ok, it's a fair cop, I was mixing up URIs and URIrefs, and not for the first time. But bear in mind that e.g.

  wget http://example.org/people#joe

won't typically raise an error, it'll just return the same as

   wget http://example.org/people

]]

What the server gives us is a representation of http://example.org/people but is it a representation of http://example.org/people#joe ? Yes, why not - a photo of Joe on his school trip to Fountains Abbey is still a representation* of Joe, even if he appears alongside all his classmates.

[[ * a representation in the usual human, non-WebArch sense of a portrayal, my point being that in the WebArch sense, "X+cruft" could be considered a legitimate representation of X]]

That a browser would go straight to the #joe anchor is pretty irrelevant - it's UI behaviour, on the application layer. It is hopping down to examine the URI chars and hence contradicting the notion of opacity. But that doesn't really break anything - a client can do what it pleases, it's only when the consumer definition starts trying to lever what goes on producer-side that big problems begin.

Practical considerations suggest the significance of the whole URI in a information representation language designed for human consumption is likely to be different than its significance in a language designed for machine consumption. What matters is consistency in the protocol through which statements in those languages are delivered. HTTP itself isn't dependent on timbl's axiom above, so I don't see a major problem.

Getting the RDF

With the "application/rdf+xml" type we'd get back something we might locally refer to as people.rdf. What the server gives us is a representation of http://example.org/people but is it a representation of http://example.org/people#joe ? Yes [[it MAY be]], but...

An extra level becomes apparent with RDF. So far it seems a HTML document can be a complete on-the-wire representation of a resource, it's an information resource. Yet graphs denoted by RDF documents do not correspond to the resource things/documents themselves, they are (at best) statements about the things/documents. This is the kind of issue Patrick Stickler highlighted with URIQA. What we're usually interested in is a representation of a resource, not a representation of a description of a resource.

An RDF document retrievable at http://example.org/people might not mention the resource http://example.org/people. But it's still (by WebArch definition) a representation of that resource.

[[ I've tweaked the above lines following Ed's comment - I did have the URIref there, , which would render the statement untrue, but it's not actually relevant to this point ]]

It may be worth considering two representations in scope here: one is the document (and the graph it denotes), the other is the whole graph, the universe of which this document's graph is but a snippet. Depends how you feel about named graphs...

However, as far as I can see, this is very similar at a conceptual level as the HTML case. Putting conneg aside, a HTML document can be a complete on-the-wire representation of a resource - that document - but only at a single point in time. My homepage today is only a snapshot representation of my homepage the resource which change daily. My homepage can only be fully described by considering all the representations: past, present and future. Bringing WebArch back in, that full description includes all possible representations in all media types. This undermines the notion of an information resource in the general case. But I believe there's an inkling of a formalisation that works for this in timbl's Ontology for Relating Generic and Specific Information Resources (incidentally, a while back Reto did a vocab which I seem to remember is very like this - DiscoBits - though I've not had chance to compare & contrast).

I still reckon the papering over the cracks that is the httpRange-14 resolution may be adequate - the 303 thing might not be the best wallpaper, it's a hassle in practice, but I think it's probably good enough in principle. As it's already been written down, maybe it is the best option in practice.

So although orthogonality is scrunched down the specific axis of #, I don't think it undermines its utility. It doesn't matter if a user agent decides #joe is the location of the (x)pointer in a HTML doc or is the whole doc or is an RDF doc or even if it corresponds to all the statements about #joe in the universal graph.

The question of how to bridge usefully between the different mime-specific notions of what a frag id means is another matter. But to me it seems that's essentially an application level issue.

As it happens Ian's being mulling over various ways of dealing with RDF that might help here, pragmatically addressing the MGET kind of issues with core HTTP by putting the resource of interest in the centre ( SPARQL DESCRIBE is used a lot around the Talis Platform) and blasting bnodes out of existence - hopefully he'll be inspired to do a write-up :-)

PS. In comments Simon Reinhardt provides something to think about:

I think RDF graphs actually *can* be direct representations, i.e. pure data, and not only representations of a description, i.e. metadata. Consider encoding your blog post in RSS / AtomOWL / SIOC, putting all the text in there. Then you have not only described the post, like who wrote it when and which title it has, but by providing the text you have a full representation of the data (well, at a given time), the actual resource.

Hmmm. That does appear to make sense, and now I'm really not sure of the impact of changes-over-time and the open world assumption...

@en

Danny Ayers

2007-11-14T11:49:55+01:00

Related

Comments