On the proposed data-linking 2NN status code
Eric Prud’hommeaux, apparently on behalf of Tim Berners-Lee (see here), is spearheading a drive for a puzzling and peculiar extension to the HTTP protocol (“2NN”). This is the latest move in the agonized httpRange-14 thread started by Tim twelve or so years ago.
This is all about how we are to read, write, and deploy linked data using RDF. (The issue pretends to go beyond RDF but I don’t see any consequential sense in which it does.) Remember that RDF is a language of URIs (I sometimes call it URI-ese), and there are three main kinds of URIs in RDF: those with # in them, those without # but deployed using GET/200, and those without # but deployed using GET/303. (Forget about URNs and so on for now.) It’s agreed that the use of hash and 303 URIs is unconstrained in RDF. The question is whether GET/200 behavior imposes (or should impose) constraints on interpretation of 200 URIs.
If there is debate over whether GET/200 imposes a constraint, and debate over what the constraint would be, and debate over whether the constraint is at all useful – and all three debates are real – the prudent RDF practitioner will simply stay away from GET/200 URIs altogether. And that is what many of them do.
But this is not a very happy situation, because the community seems to hate both hash URIs and 303 URIs. Hash URIs are ugly, and 303 URIs are inefficient (extra HTTP round trip to get the “data”) and hard to deploy. [so they say.] What’s the point of reserving 200 URIs? Why not just forget about GET/200 as a constraint, and use 200 URIs the same way hash and 303 URIs are used?
The W3C Technical Architecture Group, of which Tim is a member, spent many expensive hours of discussion time on this issue over many years. I personally made many attempts to formulate and avocate for what I thought was a reasonable position, and failed. (This is not to say it’s impossible to do, just that I didn’t.) In the end a subcommittee (after discussion with Tim) settled on a version of a proposal by Sandro Hawke, which Jeni Tennison rendered as URIs in Data. The new memo met with general approval (or perhaps resignation) within the TAG and received very little comment outside the TAG. The memo makes peace between the two sides in the debate by saying it’s all a matter of interpretation, and that you can shift between vocabulary interpretations without a change in overall meaning. It’s only the overall meaning that matters. You can use 200 URIs in RDF pretty much the way you use hash and 303 URIs, and here are some suggestions on how to express the way in which your use of them relates to GET/200. No awkward deployment, no extra round trip, no conflict with any prior GET/200 constraint. Problem solved.
I thought the matter was done and the RDF community was then going to do whatever it wanted to, with or without the solution offered by the memo. Then Tim writes that no, the memo’s proposal to use GET/200 to avoid the extra round trip is no good after all. To avoid the extra round trip we need a new HTTP status code that blends a GET/303 with the subsequent GET/200 for the data, but is not 200. It’s still hard to deploy, so not as good as the GET/200 solution from Jeni’s memo in that sense, but at least it fixes the extra round trip problem.
In other words, Time has decided that the memo’s proposal does not work. It would be interesting to know why. My guess is that it is too subtle.
There is no rational way to end this argument. It hinges on the question of finding meaning in GET/200 exchanges. We know that in the wild, GET/200 means nothing at all, at least on the web, other than that some kind of request for some kind of information was satisfied in some way by the provision of some information (as opposed to, say, GET/404, which does not mean that). There is no evidence that URIs “mean” anything on their own to the community of people deploying and using web servers over HTTP. – I’m talking about as a general convention. GET requests for particular URIs certainly do have meaning in many cases to those who make and satisfy them, but the meaning is bespoke to each URI. To find general constraints, you have to look elsewhere.
One place to look would be practice “in the wild” in the RDF community. I have not done a careful study – that would take more time than I have – but my impression is that 200 URIs are used in two ways: as hyperlinks, with their “meaning” or constraint specific to the particular link type (e.g. rdf:seeAlso, owl:imports); and, by certain parties, the same as hash and 303 URIs, i.e. unconstrained, with ‘meaning’ suggested by RDF ‘data’ retrieved from somewhere. So – with regard to Eric’s proposal, we strike out again, since we find no general rule from this source.
A third place to look would be specifications that have some standing in the community. One is the HTTP specification: it says GET/200 means that “an entity corresponding to the requested resource is sent in the response”; an entity is “the information transferred as the payload of a request”… The requested resource is the one identified by the URI, a resource is anything at all, what a URI identifies is up for grabs. This is a rathole with a dead end. I’ve read this stuff dozens of times and I can only conclude that it is vacuous. There is no use of the HTTP protocol that can possibly fail to conform with all this resource-information-entity-identifies business.
One could look at well-known advisory documents such as Architecture of the World Wide Web and REST. These say that if the server infrastructure is built according to certain principles, then the designers of its various parts will have ‘identified’ one ‘resource’ for each URI. That is, for each URI, there will be something, whose properties are known to someone involved in the design, ‘identifies’, and furthermore, that there is some (unexplained) relationship between the state of that thing and its ‘representations’ (what you get when you GET). But: how would you know whether a site is designed this way? And even if it were, how would you learn any of the properties of the thing the designers ‘identified’ by any URI? Who would you ask, and how, and in what terms would they reply? – So this is not such a useful constraint, in my opinion. It may be that the site is designed beautifully, with nice intuitive URI ‘identification’, but that there is no way a client can, in general, know anything about what is ‘identified’, much less actually know what is ‘identified’.
In any case this doesn’t match how GET/200 URIs are used in RDF. Usually you do some GET requests and look at what comes back, and then do some reverse engineering or apply intuition to guess the properties of whatever responses might come back in the future. You then impute those properties to what the URI refers to in some bit of RDF. This has little to do with what sort of software architecture is employed by those providing the web site – it is about the behavior of the site.
(Tabulator does its own brand of reverse engineering: it assumes that the identified ‘resource’ is the information you get from GET/200 – or at least something with the same properties. This is a useful heuristic and works for static documents, but is unsound in general.)
OK. So some people A believe that some other people B depend on a constraint around GET/200, i.e. that if a 200 URI is used that will have certain implication for B. But there is no general agreement on what that constraint is, so it can’t be exploited by someone writing a 200 URI, e.g. A. The prudent RDF programmer will therefore program defensively, and just avoid generating RDF containing 200 URIs altogether. Similar someone reading RDF won’t read anything into a 200 URI. If these careful people hate the hash syntax, and can’t tolerate the extra 303 round trip, then I guess 2NN is for them. That’s a lot of ‘ifs’. It all seems a bit silly to me. Good luck getting the IETF reviewers to comprehend all this.
I think that if I ever have to write code that makes use of content deployed using GET/2NN, it will probably treat GET/200 and GET/2NN the same way, and not look too hard at the status code. After all nobody has told me what the GET/200 constraint is, and there may be useful data in the response. That way I’ll also be able to use content deployed using GET/200 by people who don’t think there is any GET/200 constraint. … if everyone thought like this, then what would be the point of the 2NN response – why not just use 200 ? But I guess not everyone thinks like this.
Appendix: How do you refer, in RDF, to documents, if not using a 200 URI? Well, you can’t reliably use a 200 URI that yields the document, because nobody will be sure which document you mean. Maybe you intend the document you got with some GET of the URI, but a different document might be served on the next GET request; maybe you meant to refer to a “changing document”, maybe you didn’t, there is no way for your interlocutor to tell. So use a hash or 303 URI with metadata – that is, a citation – to refer to the document. As part of the citation you can provide a URI saying, in the usual way, that the document was retrieved using that URI on such and such a date. If there isn’t already an RDF vocabulary (BIBO??) that lets you express a citation to a level of detail that you’re comfortable with, then one can be invented. When you give the retrieved-from URI, provide it as a string literal, not as an RDF URI reference, since what the URI reference refers to is unclear – as I say earlier in this paragraph. If you really mean “whatever happens to be the latest document retrieved using this URL” then express what you mean, again using a suitable vocabulary. Don’t leave it to chance.