An essay from Xiaoshu Wang called URI Identity and Web Architecture Revisited, has prompted Ian Davis to discuss Fragmentation. I've only skimmed the essay so far, but want to get some comments down right away - if past experience is anything to go by I'll be more confused later.
Ian indicates the nub of the problem, it's one of timbl's axioms :
The significance of the fragment identifier is a function of the MIME type of the object
This does mess up orthogonality [given the URI vs URIref point below this may not strictly be true - but the net effect remains the same], but I don't think it's a web-breaking issue because the flexibility in what constitutes a representation allows a lot of wiggle room.
Take a URI like http://example.org/people#joe - it identifies a resource, let's say that resource is the real-world person Joe. We know how to do stuff with things like this in RDF - it's a URIref/IRI which allow us to say things about Joe. We also know how to do stuff with things like this in HTML - joe would likely be a named anchor on the people page.
Getting the HTML
So let's say we have simple conneg set up on the server, and
do a GET on
http://example.org/people#joe - we'd get back
something we might locally refer to as
people.html.
[[
PS. Ed Davies pointed out that you can't do a GET on
http://example.org/people#joe - the fragment isn't part of
the URI. Ok, it's a fair cop, I was mixing up URIs and URIrefs, and
not for the first time. But bear in mind that e.g.
 wget
http://example.org/people#joe
won't typically raise an error, it'll just return the same as
  wget http://example.org/people
]]
What the server gives us is a representation of
http://example.org/people but is it a representation of
http://example.org/people#joe ? Yes, why not - a photo of
Joe on his school trip to Fountains Abbey is still a
representation* of Joe, even if he appears alongside all his
classmates.
[[ * a representation in the usual human, non-WebArch sense
of a portrayal, my point being that in the WebArch sense, "X+cruft"
could be considered a legitimate representation of X]]
That a browser would go straight to the #joe anchor is pretty irrelevant - it's UI behaviour, on the application layer. It is hopping down to examine the URI chars and hence contradicting the notion of opacity. But that doesn't really break anything - a client can do what it pleases, it's only when the consumer definition starts trying to lever what goes on producer-side that big problems begin.
Practical considerations suggest the significance of the whole URI in a information representation language designed for human consumption is likely to be different than its significance in a language designed for machine consumption. What matters is consistency in the protocol through which statements in those languages are delivered. HTTP itself isn't dependent on timbl's axiom above, so I don't see a major problem.
Getting the RDF
With the "application/rdf+xml" type we'd get back something we
might locally refer to as
people.rdf. What the server gives us is a representation
of
http://example.org/people but is it a representation of
http://example.org/people#joe ?
Yes [[it MAY be]], but...
An extra level becomes apparent with RDF. So far it seems a HTML
document can be a complete on-the-wire representation of a
resource, it's an information resource. Yet graphs denoted by RDF
documents do not correspond to the resource things/documents
themselves, they are (at best) statements
about the things/documents. This is the kind of issue
Patrick Stickler highlighted with
URIQA. What
we're usually interested in is a representation of a resource, not
a representation of a description of a resource.
An RDF document retrievable at
http://example.org/people might not mention the resource
http://example.org/people. But it's still (by WebArch
definition) a representation of that resource.
[[ I've tweaked the above lines following Ed's comment - I
did have the URIref there, , which would render the statement
untrue, but it's not actually relevant to this point ]]
It may be worth considering two representations in scope here: one is the document (and the graph it denotes), the other is the whole graph, the universe of which this document's graph is but a snippet. Depends how you feel about named graphs...
However, as far as I can see, this is very similar at a conceptual level as the HTML case. Putting conneg aside, a HTML document can be a complete on-the-wire representation of a resource - that document - but only at a single point in time. My homepage today is only a snapshot representation of my homepage the resource which change daily. My homepage can only be fully described by considering all the representations: past, present and future. Bringing WebArch back in, that full description includes all possible representations in all media types. This undermines the notion of an information resource in the general case. But I believe there's an inkling of a formalisation that works for this in timbl's Ontology for Relating Generic and Specific Information Resources (incidentally, a while back Reto did a vocab which I seem to remember is very like this - DiscoBits - though I've not had chance to compare & contrast).
I still reckon the papering over the cracks that is the httpRange-14 resolution may be adequate - the 303 thing might not be the best wallpaper, it's a hassle in practice, but I think it's probably good enough in principle. As it's already been written down, maybe it is the best option in practice.
So although orthogonality is scrunched down the specific axis of #, I don't think it undermines its utility. It doesn't matter if a user agent decides #joe is the location of the (x)pointer in a HTML doc or is the whole doc or is an RDF doc or even if it corresponds to all the statements about #joe in the universal graph.
The question of how to bridge usefully between the different mime-specific notions of what a frag id means is another matter. But to me it seems that's essentially an application level issue.
As it happens Ian's being mulling over various ways of dealing with RDF that might help here, pragmatically addressing the MGET kind of issues with core HTTP by putting the resource of interest in the centre ( SPARQL DESCRIBE is used a lot around the Talis Platform) and blasting bnodes out of existence - hopefully he'll be inspired to do a write-up :-)
PS. In comments Simon Reinhardt provides something to think
about:
I think RDF graphs actually *can* be direct representations, i.e. pure data, and not only representations of a description, i.e. metadata. Consider encoding your blog post in RSS / AtomOWL / SIOC, putting all the text in there. Then you have not only described the post, like who wrote it when and which title it has, but by providing the text you have a full representation of the data (well, at a given time), the actual resource.
Hmmm. That does appear to make sense, and now I'm really not sure of the impact of changes-over-time and the open world assumption...
@en