Archive

Archive for October, 2014

Specifying meaning

Specifications

A specification articulates a property or constraint that an artifact may or may not satisfy. That is, you have some class of artifacts, like screws, to which a given specification might apply, and you can ask whether or not any particular artifact meets the specification. A specification could cover physical objects (like screws) or substances (motor oil), pieces of software, or even a process carried out by people (as in ISO 9000).

A specification may be used as a guide in choosing or constructing an artifact, can be used in an offer to sell something, and so on.

The key idea is spec as class or category: a given artifact meets the spec or doesn’t. There may be effective tests for whether something meets a specification. If a spec says that a steel rod has to last for ten years, it is neither practical nor sensible to wait ten years to see if a given rod meets that spec. But perhaps the test has been performed on identical rods, or maybe there are proxy tests that we can perform, such as accelerated aging, that will give a good indication as to whether the spec will be met by the instance in hand.

In engineering the purpose of a spec is to allow us to combine parts to yield systems that have predictable and useful properties. To put it somewhat tautologically, if X meets spec A, and Y has the property that combining it with something meeting spec A yields a complex having property P, then we can predict that combining Y with X will yield something with property P.

Standards bodies like IETF and W3C exist to cause specifications to be created. They facilitate social interactions in which people, often representing competing companies, can come to agreement to give a name (such as ‘SAE 20’) to a particular specification. This allows anyone to say things like “the thing I manufacture meets SAE 20” or “I would like to buy something that meets SAE 20”. This shorthand reduces transaction costs (time taken negotiating specifications) and creates markets (by enabling choices among providers).

Communication

W3C and IETF are specifically involved in developing specifications that apply to communication between computers. Communication involves two or more agents playing the roles of sender and receiver, connected by a communication channel that carries messages. So any specification that has to do with communication necessarily constrains some part of a sender-channel-receiver complex: the sender, or the channel, or the receiver, or an agent/channel complex.

A syntactic constraint is a constraint on what messages are or aren’t generated by a sender or interpreted by a receiver. A pragmatic constraint is one that relates what is generated or interpreted to the circumstances (state) of the sender or receiver. (The word “pragmatic” is used in all sorts of ways by various writers. This is how I’m using it.)

For example, a specification document such as that for SVG implicitly bundles two specifications, one for senders and one for receivers. An SVG sender is one that obeys syntactic constraints in what it sends. An SVG receiver is one that interprets SVG documents in a manner consistent with the specification. A receiver that draws a square when a circle is called for would not meet the specification. Since this constraint relates messages to behavior (‘circumstances’), it’s a pragmatic constraint.

Pragmatic constraints on receivers (interpretation) are common in document type specifications, such as those for image formats or programming languages. But specifications involving pragmatic constraints on senders also exist, especially in protocol specifications where a sender may be responding to a request. A weak example of a sender constraint is that an HTTP GET request must not be answered with a 411 Length Required response (since nothing requires a GET request to specify a content-length). A better example is the SNMP protocol. A device that is sent a request using the SNMP protocol (such as ‘when did your operational status last change’), and gives a syntactically correct response containing blatantly untrue information (‘two hours ago’ when actually it was five minutes ago), would not be said to be compliant with the SNMP specification.

Where constraints are not given, designers will exploit the fact to do interesting things. That SVG doesn’t tell senders which syntactically correct messages they can or should generate is the whole point of SVG: you’re allowed, and expected, to use it to express whatever graphics you want to.

In sum, we can specify:

  • syntactic constraints on senders (what’s not generated)
  • pragmatic constraints on senders (preconditions of generation)
  • pragmatic constraints on receivers (postconditions of interpretation)

Languages and meaning

Descriptions of constraints on senders and receivers are usually bundled; a single document describes them together. Comparing a given sender or receiver against constraints means paying attention to certain parts of the description, and ignoring others. This is particularly natural in the case where two agents carry on a dialog; if you are engineering a sender it’s useful to know how a matching receiver might respond to messages. A constraint bundle that applies to both senders and receivers might be called a ‘language’ or ‘protocol’.

One can speak either of the constraints met by a particular sender or receiver, or of constraints prescribed by some documentation; it’s constraints either way, retrospective in one case and prospective in the other.

Communication constraints are message dependent. That is, the constraint on an agent is that its circumstances and state should be a certain way as a function of the message on the channel: for every message M, if M is on the channel, then some constraint C_M depending on M should apply to the agent. If the agent is a sender, the constraint is on states or events leading up to the message being on the channel – what the world has done to the agent. If it’s a receiver, the constraint is on states or events following the message being on the channel – what the agent will do to the world.

(The constraint on a sender could also be on a future state of the sender, in which case the message is a promise.)

The pair of constraints (S_M, R_M) on sender and receiver agents, specific to a message, is a good candidate for the ‘meaning’ of a message relative to a language. It appears, then, that it is possible to specify, or reverse engineer, the meaning of a message when it appears on a particular channel.

When the sender is pragmatically unconstrained (it can send what it likes) and the receiver is constrained (has to do what it’s told), a message or language is ‘imperative’. When the sender is pragmatically constrained (must give information about its circumstances) and the receiver is not (can do as it likes with that information), a message or language is ‘declarative’.

‘Knowledge representation’

The official W3C specifications for RDF are very weak and impose no pragmatic constraints. An agent that sends RDF messages (‘graphs’) is not out of spec for sending any syntactically correct RDF under any circumstances; nor is an agent that consumes RDF constrained in what it does having received it.

(There are so-called semantic constraints for RDF and OWL: every graph is supposed to be “consistent.” But this is effectively a glorified syntactic constraint, since it can be decided without touching pragmatics.)

The are some pragmatic constraints around SPARQL, but these only dictate that a SPARQL server should act properly as a store of RDF graphs – if something is stored, it should be retrievable.

The interesting thing that RDF (and OWL) do is to suggest that RDF users may create secondary specifications to apply to senders of RDF. Such constraint sets are called “vocabularies” or “ontologies”. An agent conforming to a vocabulary will (by definition) not generate RDF messages at variance with that vocabulary. If we take RDF and further constrain agents by a set of vocabularies, what we get is a declarative language, something much more like SNMP than it is like SVG.

For example, the specification for the Dublin Core vocabulary effectively says that if [dc:title “Iron”] is in the message/graph, but the resource to which this applies does not have “Iron” as its title, then the sender of the message is not in conformance with the vocabulary. (I’m taking a notational liberty here for the benefit of people who don’t know RDF, and I beg you not to ask what “the resource” is, since the answer is difficult and irrelevant to the example.)

Unlike SNMP, whose pragmatic constraints can be satisfied by a mechanism, it is usually impossible to use an RDF vocabulary correctly without human intervention. A vocabulary-conforming RDF sender almost always obtains information from a human source, either from form input, text mining, or manual curation. In these cases an automaton is only conforming if the human input is correct, so it is the human + automaton complex that should be judged against the specification. By the same token, interpretation, while unconstrained by the vocabulary specification, in most use cases requires human intervention for any useful pragmatic effect. Thus most involvement of computers with RDF is for those applications not requiring generation or interpretation according to vocabularies: storage, search, translation between formats, and inference.

(I say “usually” impossible because it is certainly possible to use RDF in a manner similar to SNMP, where automaton-generated graphs are vocabulary-conforming without human input. But this is not how RDF is ordinarily used in practice.)

So there are three funny things about the “semantic web” languages that make them misunderstood outliers in the world of W3C/IETF specifications.

  1. Unlike nearly every other artificial language (XML and JSON excepted), they have no meaning – no pragmatics are defined by the core specifications. All pragmatics comes from layered specifications.
  2. As practiced, i.e. subject to vocabulary specifications, they are declarative, not imperative; pragmatic constraints from vocabularies are on senders (similarly to SNMP servers), not receivers (as in SVG, C, …).
  3. Meeting the pragmatic constraints of vocabularies typically requires human involvement, meaning that vocabulary specifications are meaningfully applied not to automata but to automata/human complexes.

Reference

I wanted to write about the question of whether reference could be specified, but needed the above by way of introduction. More later perhaps.

Oh, maybe you wanted to know what sources I would want to refer you to as background to this piece of writing. Philosophical Investigations is a propos. And I acknowledge the influence of Larry Masinter and Gerald Jay Sussman, but have nothing of theirs specifically to refer you to.

Advertisements
Categories: Uncategorized