Serializational State Transfer?
There is this thing called REST, which its authors describe as a software “architecture style” that is a “model of how Web applications work“. The early description of the Web, where ‘universal document identifiers’ name documents, became inaccurate pretty quickly after the Web started, since HTTP didn’t prevent anyone from making web sites where the same UDI could yield different documents at different times. This was extremely useful but led to considerable confusion over the founding theory: what a ‘document’ was and which one was ‘identified’ if you could get different ones at different times, when authenticated differently, etc. So to replace the foundational theory Fielding looked for a better description of how Web applications work, and in particular how they use URIs. It sounds as if he surveyed a number of Web applications, and found something common among them, and he called this common whatever REST.
Under REST a UDI (now called a URI) is no longer associated with a document, but rather with a ‘resource’ or ‘network resource’ or ‘object’. This isn’t really helpful in itself since it just shifts the ontological confusion from one word to another. The only thing we observe in HTTP is message payloads, so the only question to be answered in analyzing communication is, how do message payloads (particularly for GET) relate to these resource-things, and give us information about them (properties)? If there is no relationship, then we know nothing at all about resource-things.
The party line is that the payload is a ‘representation’ of the resource-thing, which is again not at all helpful since it only changes something we didn’t understand and don’t have a name for, to something we don’t understand and do have a name for. The word is evocative, to be sure, but extremely ambiguous. So what, really, was the REST insight, based on empirical study of real Web applications?
I’ve mulled this over for a long time and here is what I’ve come up with. The ‘resources’ or ‘objects’ of REST are data structures or data objects that are stewarded and controlled by computational devices, and the message payloads are serializations of the current state of those data objects. That is, if you read about REST and substitute ‘serialization’ for ‘representation’, it will make a lot more sense.
So the empirical claim, as I interpret it, is that a common pattern among Web applications in the mid 1990s is that payloads (web page instantiations) are associated with single data objects in the memory space (disk or otherwise) of the application. The object could be a database record or table, or a map or image or etc. The data object (that is, its contents, or state) can change over time, in which case you get different serializations at different times, but it’s always the same object being serialized, for each URI.
I don’t know whether this claim is true; I didn’t do a survey of the kind Fielding did. It would be nice to see evidence. My feeling, based on looking at lots of web pages, is that payloads/pages are probably assembled from bits and pieces of many data objects in the application, in addition to simply providing information that’s not stored in a data object, but is just the output of some process such as a calculator, chat, random number generator, webcam, etc. I would be surprised to learn that REST is true of even 25% of web sites or URIs out there.
In addition to the empirical claim there is advice, based on the experience of the authors, that if you create a web site using the REST style, it will be a better site than if you hadn’t. This prescriptive angle is amplified in the W3C Web Architecture recommendation. This is a different claim, and may well be true – certainly if you follow REST style you’ll get URIs that work nicely and intuitively as bookmarks and hyperlink targets, and this is both a social good and should be in the self-interest of site creators.
The thing that I and other people have found confusing is the use of REST language in the HTTP specification. REST is a statement about how the organization of an application relates URIs, message payloads, and its internal data objects, and HTTP does not govern that kind of thing, it only governs the payloads themselves. A web application can be very non-REST and still conform to HTTP. So the spec’s REST language has no normative force. Instead REST has been used as a guide to the design of the HTTP specs; applications that are REST are, by design, going to be a more natural fit to HTTP than other applications (such as streaming video).
But the statement in HTTP that the payload of a 200 response to a GET is a representation of the identified resource, is not true in general, it is just wishful thinking, a vain hope that every web application will use REST style. But it’s there in the spec, so it sounds like it has normative force. The only way I can keep my head from exploding when reading the spec is to tell myself to relax and read something else instead.
Then there’s the awful situation with URIs in RDF. In RDF you want to coordinate agreements over what the properties of the things ‘identified’ by URIs are, so that you can use those URIs in communication. To me it’s clear that if the URI is associated with a REST data object, the REST style advice would be that the URI ‘identifies’ the data object whose serializations we see when we do GETs, so the properties of the thing identified (in RDF) by the URI are the properties of the data object inside the application (field values, size in bits, data type, last modified date/time, current serialization(s), that kind of thing). But this doesn’t help coordinate understanding of the data object by others, since only people with knowledge about how the application operates internally can know any properties these data objects; there’s no standard channel through which this knowledge can be communicated, and nobody would be interested in saying anything on it, if there were. Of course an outside observer doesn’t even know whether it’s a REST URI at all. And if the URI is not associated with a REST data object, anyone can say anything about what it ‘identifies’ – there may sometimes be agreement, or understanding, by luck, but there is no standard or widely used best practice to help out (other than the ‘hash URI’ and ‘303’ cases, where there is coordination, local to the RDF world). Most of the time the ‘I’ in ‘URI’ is mere wishful thinking, not a matter of fact as a naive reading of the acronym would suggest.
To me the REST theory feels like a post hoc attempt to justify calling those strings that start ‘http://…’ ‘identifiers’. In order to be identifiers they have to identify something. There would be no reason to say what the identified thing was, or even that there was such a thing, if all we cared about was a communication and a smoothly working Web; the ‘identified thing’ would have no reason to exist at all. But if you are committed to seeing ‘http://…’ things as identifiers, you are forced to come up with a story about what they identify.
(RDF is a different story, since it works in the other direction: we want to express a statement that X has property Y, so what name should we use for X? Hey, maybe we can use a URI.)
I could go on and on… I haven’t even touched on the context dependence of ‘identification’: it only makes sense within some particular language that lets you express claims or commands that have agreed consequences…