Home > Uncategorized > “Identifier”


OK, Tim D is giving me a hard time about this, so I need to talk about it a bit.

I think terminology is pretty important and tend to spend a lot of time thinking about it and talking about it. One might take the Humpty Dumpty position that a word can be redefined in any way one likes, and if your clout level is either very high or very circumscribed you can get away with it. For example, mathematicians redefine common words all the time as terms of art with meanings ridiculously detached from ordinary usage: group, field, ring, catastrophe, category, object, arrow, complex, matrix, and so on. They get away with it because context is rarely lost; you know when you’re doing mathematics and not; and also because they’re a strong force: when they deliver value, which they do, they earn the right to change the meanings of these words.

Other acceptable cases include when there is clear scope (as when Don Knuth’s book Surreal Numbers redefines “number”); humor, absurdity or affection; some kind of marker such as capitalization (such as “BOA” in Common Lisp); or a usage that is so remote from ordinary use that there can be no confusion (“BOA” is also an example of this).

I am also not too worried when a definition as a term of art is a subset of common usage, or if the stretch in meaning is not too much (such as “record” in a database).

But when the term is given a meaning that overlaps common use you’re asking for trouble. The reason is the danger that the use can get detached from the context in which the term is defined as a term of art. This can happen as a result of a copy/paste, or someone entering in the middle of a conversation, or someone just forgetting the definition, or even purely unconscious forces that introduce bias and false intuitions.

I want to gripe first about “identifier” (which is currently being discussed on the IAO list). This has been corrupted by the computing and web folks to be almost meaningless. The corruption is even reflected in the Wikipedia article. Wordnet gets closer to a natural definition: “a symbol that establishes the identity of the one bearing it”. My preference is to restrict use to cases where a mark is borne by the thing that’s supposed to be identified, and where the mark can actually serve an identification purpose. Basically an identifier is any marking or other property that can be used to decide whether the thing at hand is the same as, or different from, a thing seen at another time or by someone else – i.e. to discriminate between a state of affairs in which something is seen twice and one in which two things are seen. Good examples are [UPCs (written on labels on a product), ISBNs (similarly), – see comments!] unique keys in database records, RFIDs, fingerprints, scars, etc.

Compare to “identifier” in the Scheme reports: I found no direct definition, but we have “let, even though it is an identifier, is not a variable, but is instead a …” and similar examples. Apparently the usage was picked up from usage in other similar documents in the computing literature. “Identifier” is used as a syntactic category in the language, not with reference to any particular role in identification processes, which is sort of like defining “doctor” to be any person.

(I’m not criticizing any editor of the report, as we were all complicit in this.)

Another example is “identifier” in RFC 3986. If one makes the reasonable assumption that a URI is a kind of identifier (that’s what the “I” stands for) one gets into all kinds of trouble.  tag:jar@mumble.net,2009:fdsa is a URI (i.e. syntactically satisfies the spec), but as it only occurs in this blog post, without explanation, what on earth would it identify, and how would it do so if it did?

Even if we allow that a URI is only a potential identifier – that is, a string designed to participate in some system of identification, as opposed to one that actually does participate – the case for identifierness is tenuous. In what sense could http://google.com/ be an identifier? It might be useful to some web server in identifying one of the entities it has on hand as the one that a client is talking about. But it’s more likely that the server just has a table, or some other process, that matches request URIs (strings) with a source of information (“data object or service” per RFC 2616) that itself does not bear an identifying mark. Of course the URI is of no use to a client in identifying anything. A better term for the role the string plays on the client side might be name, locator, or designator.

At best, you’d have to distinguish a true identifier from a potential identifier, just as one would distinguish an actual word (“frog”) from a potential word (“gorf”) or an actual name (“Pat Hayes”) from a potential name (“Zacharias Mbutu” – if someone in fact has this name I apologize and will fix the example!). But consider the Scheme case of “let”. This string certainly plays an important role in the language, but the role it plays is not one of “identifying” a syntactic form (with associated semantics) in the sense of telling that form apart from other forms that it might be confused with – it merely designates or names the form. Perhaps a Scheme interpreter might have a collection of things, each marked with a string, and it might use the string “let” attached to something to identify that thing as the one that should be used internally to interpret an expression; but this is certainly not demanded of an implementation, and is not what the programmer has in mind. To the programmer the string “let” has usage, meaning, etc. but is not used for identifying anything.

(Apologies to Pat Hayes, the originator of this argument.)

Gotta run; I’ll rail against “resource” soon, and maybe “variable”.

Categories: Uncategorized
  1. 2009-11-20 at 16:49

    Even if we allow that a URI is only a potential identifier…

    Since I am just come from reading Quine’s “On What There Is”, isn’t the right way to say this is that a URI is a pronoun?

    I like your examples of ISBNs, or UPCs, as identifers — marks borne by the object themselves, meant to allow for unique identification.

    • 2009-11-20 at 17:44

      Thanks for the reference to Wikisource! Very cool.

      Actually ISBN and UPC, like model number, are not good examples since they don’t uniquely identify the object, but only place it in a class, all of whose members carry the same mark. But the idea is similar. Much better would be serial numbers, which distinguish members of a class from one another.

      Hmm… “to be is to be in the range of reference of a pronoun”… yes, and this is similar to what we mean by a “name”. The ideas I’ve given are of course not original to me, as I got them from Pat Hayes; and he has argued that at least in the RDF context, URIs are being used as names, not identifiers. I’m a believer, although I find it a bit of a stretch to say that rdf:type “names” the class membership relation – that would be like saying that in English “be” is the name of the being verb.

  2. 2009-12-07 at 14:27

    While I agree the Scheme report(s) contain no explicit sentence of the form “An identifier is …” (and that’s surely a defect in the report), the idea (as I understand it, and it’s consistent with the passage you quote) was that an identifier is a lexeme off the production in the lexical syntax.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: