Archive

Archive for November, 2011

Tough URLs

Henry Thompson and I have been puzzling for a few years over the question of why the Web doesn’t have URIs that are widely perceived as both robust – in the sense of resisting attacks such as expiration, corruption, and censorship – and actionable – in the sense of just working in the browser. We have identifiers systems that are one or the other, but not both – why?

The robust identifier systems that we have range from pre-Internet ones like Linnaeus’s binomial species names (which are tied to their priority literature reference), the chemical element symbols, ISSN, and so on, to modern inventions such as URNs, info: URIs, and the digital object identifier (DOI). Our actionable identifiers (or locators?) are things like http: and ftp: URIs – a notably disjoint set.

Why should anyone care about robust actionable URIs? The reason is that, if they existed, they would marry a cornerstone of civil discourse with to the central modern communication technology, namely the Web.

We take robust reference for granted in everyday civil, legal, scientific, technical, and political discourse, so much so that it is not even called out as a named phenomenon. If you’re debating a law or a scientific article with someone, the last thing you want is for your argument to go wrong because the two sides are working from different documents – especially if the difference goes undetected. This would be stupid.

But reliable reference was not always the rule. It took the world hundreds of years following the invention of the printing press to deal with this problem. Now we are repeating the reference chaos of reference in the early print world on modern technology.

References are easy to deal with if you are a human, speaking natural language, with a bit of time on your hands. If you see the species name Rana pipiens and know a little bit about how species names work you can look it up to get the primary reference for that name. Each identifier system has its own set of resolution services, many of them on the Web and open. But informal references in dozens of different identifier systems is not the same as being first-class citizens on the Web – as I say human intervention is required. Making references accessible to computers using ordinary (i.e. Web) protocols vastly accelerates any process that needs to follow them. And to do this, today, you need something that starts with http:// and a domain name.

By now you have no doubt found many ways to poke holes in what I’ve said so far. Are “tough” URIs really possible? What exactly could that mean? Isn’t it impossible to eliminate all vulnerabilities? On the other hand, given that the examples of robust mostly are, isn’t a URI such as http://dx.doi.org/10.1155/1987/47105 a counterexample to my claim that we don’t have robust actionable URIs? And if this is such a problem, why on earth hasn’t it been solved already? Is it inherently intractable or is this some kind of awful techno-social mistake that can be fixed?

What interests me is a sweet spot in between these two extremes: more robust that current-day doi.org URIs, but admitting the unavoidable inevitability of certain vulnerabilities.

OK, I have more to say about threat analysis, IDF, ICANN, P2P, and so on, and will do so in a followup. In the meantime – if you want to talk about this, please come to our workshop in Bristol, UK, on December 8th!

Workshop announcement

[Minor copy edits on 2012-08-02]

Advertisements
Categories: Uncategorized