Archive for August, 2012

LSIDs are not URIs

[2012-09-10 I’m informed that I’m beating a dead horse; and should be gentle with TDWG. Agreed.]

(Continuing from the “tough URLs” post.)

The so-called “life sciences identifier” or LSID emerged as a candidate “persistent identifier” scheme in the early 2000s. The early hype led to uptake by a few projects. But the spec is an orphan: it failed to find a well-established sponsor such as a government, library, or viable standards organization, so there is no management or oversight of the namespace. This puts anyone using them (not those providing or creating them, but those actually depending on them) at risk, which of course will limit their uptake, creating a downward spiral.

LSIDs have two related difficulties. One is that they look like URNs, but do not have any standing with IETF, which oversees URN namespaces. (If ‘lsid’ were a URN namespace it would show up in the NID registry.) This makes the claim that LSIDs are a “standard” rather dubious. The practice of staking out territory without consent in a namespace managed by someone else is appropriately called “squatting”.

As a consequence of this I would strongly discourage anyone from putting urn:lsid:… where an RDF URI reference or XML namespace URI is required.

The other difficulty is that LSID claims to “persistence” have no basis. At present the “authority” field of an LSID is unmanaged, leaving open the possibilities of collisions and of fatal obscurity. People seem put a DNS domain name in this field, which is odd because the ephemeral nature of domain names was one of the reasons given for creating the LSID spec in the first place. Domain names can change hands over time, which can lead to collisions; and their resolvability is not assured through any credible social process, since backup resolution for a domain that goes silent is not at present either socially acceptable or well supported technically.

Registering a URN namespace isn’t hard; you just have to write an RFC draft and get IETF consensus. But persistence is a requirement for URN namespaces, so whoever’s submitting the RFC will have to make the case to IETF reviewers that LSIDs are persistent. This problem can be overcome as well, by designating a registry and establishing registration standards to forestall the creation of fly-by-night identifiers.

Registries seem to create a single point of failure, but for namespaces that are both valuable and understood by the community to be persistent, succession management can be taken care of. In the case of resolving existing names, if a namespace is understood to be persistent, then copies of it can be made without risk, so backups can serve if the original resolver disappears. The creation of new names is harder, since there has to be universal agreement on a control point. But again this can take care of itself, since competition tends to destroy the utility of the namespace, and those who care about this kind of thing know this.

As an alternative to a registry, consensus could be built that “the world is the registry” and LSID authority names should be durably publicized (published) and then never reused. This approach seems a bit shakier and perhaps less scalable than a coordinated registry, since assessing durability and publicity are arts. But this approach has worked pretty well for binomial species name nomenclature.

I was surprised to see that LSIDs are not just tolerated by TDWG (the Taxonomic Data Working Group), but promoted, front center (see TDWG home page). It’s unseemly behavior for a standards organization such as TDWG to promote something as rickety as LSIDs. In doing to it is thumbing its nose at a sister standards organization, IETF, by failing to respect their process.

Mine is not an anti-URN or pro-http: position; there is a correct idea to URNs and other persistent namespaces such as the MIME type registry and the .arpa top-level domain. Obviously nobody has enough control over the future to ensure that anything lasts forever, but simple agreement (or lack of disagreement) among all parties who could either care or who might exert control is the way social truths become established. Persistent naming is a confidence game. It can work if it’s seen as a speech act, similar to a declaration of independence or wedding ceremony, and everyone takes it seriously.

I’m not really in favor of making LSIDs work, and don’t want to argue the merits of URNs here, but that’s neither here nor there. I’m just urging anyone who is trying to use them to either make them solid, or cease and desist. “Standardness” and “persistence” don’t just happen; they require hard work.

(What I still need to write about: persistence of http: URIs.)

Categories: Uncategorized