Home > Uncategorized > LSIDs are not URIs

LSIDs are not URIs

[2012-09-10 I’m informed that I’m beating a dead horse; and should be gentle with TDWG. Agreed.]

(Continuing from the “tough URLs” post.)

The so-called “life sciences identifier” or LSID emerged as a candidate “persistent identifier” scheme in the early 2000s. The early hype led to uptake by a few projects. But the spec is an orphan: it failed to find a well-established sponsor such as a government, library, or viable standards organization, so there is no management or oversight of the namespace. This puts anyone using them (not those providing or creating them, but those actually depending on them) at risk, which of course will limit their uptake, creating a downward spiral.

LSIDs have two related difficulties. One is that they look like URNs, but do not have any standing with IETF, which oversees URN namespaces. (If ‘lsid’ were a URN namespace it would show up in the NID registry.) This makes the claim that LSIDs are a “standard” rather dubious. The practice of staking out territory without consent in a namespace managed by someone else is appropriately called “squatting”.

As a consequence of this I would strongly discourage anyone from putting urn:lsid:… where an RDF URI reference or XML namespace URI is required.

The other difficulty is that LSID claims to “persistence” have no basis. At present the “authority” field of an LSID is unmanaged, leaving open the possibilities of collisions and of fatal obscurity. People seem put a DNS domain name in this field, which is odd because the ephemeral nature of domain names was one of the reasons given for creating the LSID spec in the first place. Domain names can change hands over time, which can lead to collisions; and their resolvability is not assured through any credible social process, since backup resolution for a domain that goes silent is not at present either socially acceptable or well supported technically.

Registering a URN namespace isn’t hard; you just have to write an RFC draft and get IETF consensus. But persistence is a requirement for URN namespaces, so whoever’s submitting the RFC will have to make the case to IETF reviewers that LSIDs are persistent. This problem can be overcome as well, by designating a registry and establishing registration standards to forestall the creation of fly-by-night identifiers.

Registries seem to create a single point of failure, but for namespaces that are both valuable and understood by the community to be persistent, succession management can be taken care of. In the case of resolving existing names, if a namespace is understood to be persistent, then copies of it can be made without risk, so backups can serve if the original resolver disappears. The creation of new names is harder, since there has to be universal agreement on a control point. But again this can take care of itself, since competition tends to destroy the utility of the namespace, and those who care about this kind of thing know this.

As an alternative to a registry, consensus could be built that “the world is the registry” and LSID authority names should be durably publicized (published) and then never reused. This approach seems a bit shakier and perhaps less scalable than a coordinated registry, since assessing durability and publicity are arts. But this approach has worked pretty well for binomial species name nomenclature.

I was surprised to see that LSIDs are not just tolerated by TDWG (the Taxonomic Data Working Group), but promoted, front center (see TDWG home page). It’s unseemly behavior for a standards organization such as TDWG to promote something as rickety as LSIDs. In doing to it is thumbing its nose at a sister standards organization, IETF, by failing to respect their process.

Mine is not an anti-URN or pro-http: position; there is a correct idea to URNs and other persistent namespaces such as the MIME type registry and the .arpa top-level domain. Obviously nobody has enough control over the future to ensure that anything lasts forever, but simple agreement (or lack of disagreement) among all parties who could either care or who might exert control is the way social truths become established. Persistent naming is a confidence game. It can work if it’s seen as a speech act, similar to a declaration of independence or wedding ceremony, and everyone takes it seriously.

I’m not really in favor of making LSIDs work, and don’t want to argue the merits of URNs here, but that’s neither here nor there. I’m just urging anyone who is trying to use them to either make them solid, or cease and desist. “Standardness” and “persistence” don’t just happen; they require hard work.

(What I still need to write about: persistence of http: URIs.)

Categories: Uncategorized
  1. philliplord
    2012-08-22 at 14:09

    As you say, LSIDs have somewhat died a death. I think this is a pity, but such these things happen at times. What I think is most unfortunate about the situation is that they LSIDs have a strong sense of what they should resolve to. DOIs are apparently “permanent” but can resolve to anything and that anything can change over time, with no way of knowing it.

    In terms of the domain name issue, my memory of the spec did have something for dealing with this situation. But I think it is a red herring to be honest. Take DOIs — for example. Their display guidelines now state you should use http://dx.doi.org/10.xxx form of display. So, DOIs are not also dependent on DNS, particularly on the doi.org domain. Does this suddenly make DOIs unstable? Technically, there is no difference between doi.org and any other domain. But socially there is.

    The bottom line, here, is persistency is social not technological. If I were using LSIDs, nowadays, I would switch to URIs (because everyone understands them) and support the permanance with a strong social guarentee that the domain would not die. Why would I use URIs — well, simply because the best way to achieve persistance is to do the same thing as everyone else, what ever that is.

    • 2012-08-22 at 22:08

      I probably wouldn’t have written about LSIDs had I not seen them front center on TDWG’s home page. I care about TDWG and its reputation and think the halfway attitude toward LSIDs (apparently advocating for them as a standard for the community, while failing to pursue them properly in IETF) is harmful to them.

      As I said I plan to write later about domain names such as dx.doi.org and xmlns.com. I’ve been putting this off because IMO the story is more nuanced than people on either side of the http: persistence debate see, and it’s hard to write about. So I will defer commenting on that, except to say that yes, one could go through all of the deployed LSID authority strings and ask whether they are as resistant to attack as is dx.doi.org. Perhaps they are – but once you’ve done this analysis, you’ll have a list of those LSID authorities that come out gold-plated, with a convincing story about each one, and you would have in hand the seed of the registry that I was suggesting in the post…

      I think everyone still in the discussion agrees that the problem is social (or one might say institutional), not technical. Efforts to (socially) engineer identifiers for persistence are inherently weak. Any gap in buy-in (such as IETF not sanctioning urn:lsid: or ICANN not believing in domain name permanence) is a threat because appearances are so important in this business.

      (Terminology nit, by the way: obviously LSIDs were *intended* to be URIs, since they were intended to be URNs and URNs are URIs. I think when you say “switch to URIs” you mean “switch to http: URIs”.)

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: