JSON: missing a link?@en

Friday and Saturday I spent a good few hours fiddling with conversion from a dialect of XML to JSON. Immediately afterwards I felt a bit grumpy about JSON, it seemed like I had to jump through a lot of hoops to do a relatively trivial translation. After a day off, my grumpiness has mellowed, or at least balanced to the point of thinking the problems of a conversion stem from both the source and target syntaxes. But the exercise did highlight what seems a serious flaw with JSON.

While JSON does have ready-made production/consumption tools in the form of browsers, there are a couple of things that appear to undermine its potential. First off, there's cross-site scripting (I wish my bandwidth was a bit better, I'd like to see Crockford's video on this). This isn't really the fault of JSON per se, it's more an artifact of seeing the HTML browser as the One True User Agent, and inheriting all its issues. As a counter-example, the work around CouchDB is certainly interesting, especially with Sam Ruby lifting out the JSON-oriented interface and seeing how it might work out for a different back end. But I reckon there's a more fundamental flaw.

Thing is, it seems like in the drive for an easy-to-use data format for the web, JSON throws the baby out with the bathwater. A key idea in WebArch is that of self-describing messages. To be realistic about this, the messages are only partially self-descriptive. However (and I reckon this is pretty important) as part of the web they can indicate further sources of information on how to interpret the message. Some of these are registry-based, notably media types but there's a limit to their utility in that a single media type may be used for many different dialects (e.g. "application/xml"). But as well as rigid registries, there's also the open-ended opportunity to use linked data.

Now here's the thing: every modern XML document on the Web is associated with at least two URIs. There's the URI of the resource the document represents (its URL) and the URI of the namespace of the document's root element. This latter URI effectively associates the document with a means of interpretation of the local markup in the global context (there are other mechanisms like DTDs but the namespace idea still seems the big one). Namespaced XML on the Web is a form of linked data, out of the box. Without special-case media types or namespace disambiguation, seriously unreliable sniffing heuristics are needed. (Microformat data without a profile URI suffers from exactly the same problem).

There's no way of getting an arbitrary JSON document from the Web and knowing what it says - or even finding out. Sure, the browser can provide a view, but in that case the semantics of the document are taken from its presentation, which is a failure to separate concerns. You're at the mercy of the specific viewer implementation. (Would it be too cynical to suggest that part of the motivation behind HTML5 is a yearning for global semantics, and that standardising browsers is a roundabout way of providing them?). This in itself might not be that troublesome, other conventions/format registries (like microformats.org) could act as a substitute for the missing link. But there is something fundamentally unwebby about all this.

If you need to have information in advance about the data for it to be useful, no matter how much lip service is paid to REST, your system is just as tightly coupled as if you were relying on named method calls. If you can't in principle switch say the social network input of your mashup from MySpace to Facebook without changing code to allow interpretation of the new source, then you're not really taking advantage of the Web, or for that matter gained very much over using WS-* ( with GRDDL is in place the switch could already be very much in practice).

Ok, this might all be a bit astronautical, but there's definitely one pretty immediate aspect where JSON falls down in comparison with XML. How do you include 'foreign' data dialects in a coherent fashion? It gets messy at the first hurdle - check this discussion on expressing Atom in JSON.

Personally I'm beginning to suspect that syntax-oriented XML/JSON translation may be a technical non-starter in the general case, that the only way around it is to map through a common web-oriented data model (in which any registry-based assumptions are resolved, and grounded to URIs). Given that this would probably be an anathema to folks motivated to use JSON by its simplicity, it's probably a non-starter socially as well. (So while I still reckon a convention on RDF/JSON would be useful, I suspect it'll only be useful to folks already using RDF).

See also: John Musser on Douglas Crockford on the Mashup Problem

PS. Re-reading that, it does sound terribly negative. I should add that I reckon JSON is a great idea, only the points above suggest its out-of-the-box utility is limited as a general-purpose data format for the web. YMMV.

@en

Danny Ayers
2007-10-01T11:39:27+02:00

Related
Comments
Edit