Digital preservation and independently held copies

Some information – writing, data, images – is important enough that it should be preserved and made available for as long as possible. Somebody, 5 or 10 or 50 or 200 years form now, might want or need to look at it. If you care that something be preserved, you will ask yourself what you can do to help bring about preservation.

It’s very easy for an individual, a project, or an organization to say: I am in control of this information, I am a responsible member of the community, and I can be a good steward. I will use the best redundancy technology and keep good backups, so the stuff will be safe from fire, natural disaster, and so on. It will be preserved because I will preserve it. (See e.g. NARA’s codification of being responsible.)

This may be true, up to a point, but it is a delusion. The risk that an individual, project, or organization might suddenly lose its ability to preserve is too great, in my opinion, for this to be an acceptable digital preservation solution by itself. Individuals die or become disabled; projects get canceled by management under budget pressure or changes in priorities; and organizations close or go bankrupt. And everyone is vulnerable to legal and governmental takedowns and censorship, and acts of war. These are all very unlikely events, but over long periods of time, unlikely risks become somewhat likely.

Every preservation plan must therefore include distribution of the information to one or more independent parties that are very likely to survive threats against the original steward. The receiving parties should be organizationally and legally independent of the original steward, and should reside in a different jurisdiction (country). They should keep their copy because they want to, not because they are being paid to.

Someone who gets one of these copies should by ready, if necessary, to make it available for use and perhaps further dissemination and preservation planning.

This is whether we’re talking about Very Important Stuff handled by big well-funded entities, or stuff that’s extremely informal and small-scale. If it’s useful in your community, make sure a friend in another country has a copy.

Oddly, this problem used to be solved, but is now unsolved. During the print era, the natural and economical way to disseminate information was to make lots of copies and get libraries to take them up. Redundancy was a completely natural side effect of copying technology and economics. The Internet works in a completely different way: copies are made on demand (copied from the server to the client) and thrown away. There are content distribution networks (CDNs), but these are ephemeral and dependent (under contractual control of the original steward). We no longer have independent stewards of copies of things because we don’t need to to support our day to day habits.

(If the stuff in question is an active database, the recipient may also choose to continue updating it, or give it to someone else for update coordination, but this is an optional and orthogonal secondary step. The main point is that the information should be preserved, because someone might need to know what it says.)

If the “backup” is to become the new principal steward – and one should always be prepared for this – it will be important to transfer domain names as well. If the original steward is incapacitated, then the backup organization will have to change the DNS records without coordination with the original. That means prior transmission of registrar passwords. Arrangements like these are complicated and fragile, and therefore much rarer than they need to be. An excellent example of organizations doing the right thing in this regard is the coordination between FOAF and Dublin Core.

I was telling this story around 2007 to anyone who would listen, as part of my work for Science Commons. One of the most important infrastructure databases for scholarship is the Crossref DOI metadata – the information that gives you basic bibliographic information for the publication associated with a DOI. At the time I didn’t know whether Crossref was copying its database to an independent foreign partner, and maybe it wasn’t, but by 2010 Crossref had announced backup to Portico, which sounds pretty good to me – Crossref is a UK organization, Portico is a US organization, and neither would be made vulnerable by the other’s legal or financial trouble. The fact that Crossref issued a press release about this tells me that the idea of independent copies is neither obvious nor silly.

Twitter is not a very good way to carry on a conversation, but it has the advantage of being public, which helps keep people honest and responsive. ORCID is a fairly new organization that has an infrastructure database similar to Crossref’s, one that is starting to gain an important role in scholarship. On 18 January I casually asked:

wondering, does @orcid do outside-org outside-country backups like @crossref does (http://www.crossref.org/01company/pr/news111610.html …)?

The answer from @ORCID_org:

@jar346 @orcid @crossref Yes, we have backup servers in countries outside of US.

This didn’t answer my question; to me a “backup server” is something administered by the originating organization, perhaps physically residing in a different locale but not necessarily accessible to any “outside-org” there. And I found nothing on their site to reassure. Rather than continue on twitter I wrote this post. Maybe they will read it and get a better idea of what I was trying to say.

Don’t get me started on copyright.

  1. 2016-01-24 at 02:20

    There’s lots of history here of people working on archiving and long-term preservation. LOCKSS, for example. See http://larry.masinter.net/0603-archiving.pdf for another approach and the references.

    • 2016-01-24 at 03:42

      Quoth you: “… it is useful to consider the possibility of ensuring that the storage repositories are held under multiple legal jurisdictions, such that a legal intervention in a small subset of locations will still not cause the archived information to be lost, modified, destroyed, or revealed in ways inconsistent with the archiver’s intent.” Exactly.

      There’s lots to be said about digital preservation and copy-keeping, and others have said it. But I think the specific points about extra-jurisdictional and non-contractual copies are nonobvious and bear repeating. The question of domain name management for preservation is also not raised often enough.

