I was cleaning out a file drawer this summer and came across some manuscripts that Preston Hammer thrust at me when I met him in about 1976. I remembered this conversation well; I was happy to encounter someone who cared about programming languages and foundations of mathematics as much as I did, and was impressed with his forceful manner and unorthodox opinions. So when I looked through the papers I thought: Nowadays there’s an Internet! Maybe I can learn a bit more about this odd fellow. I started doing some searches, and collected the fruit of what became an obsession for a day or two on a web page.
He led an interesting life. He did numerical programming at Los Alamos in the late 1940s and throughout the 1950s, and started the computer science departments at University of Wisconsin and Penn State. I think he was frustrated that his training in (continuous) mathematics didn’t prepare him for the computational tasks he was faced with, and in his research he focused on trying to resolve the differences between discrete and continuous mathematics. He cared strongly about pragmatic education in mathematics and computation, to the extent that he came into conflict with the mathematicians he had to work with.
Anachronistically, somebody created a LinkedIn profile and web site for him, but I can’t tell who.
The amount of detail I easily found on this man, who died six or seven years before the Web started to happen, was remarkable. Dates and places of birth and death, campus newspaper stories from the 1960s, unflattering mentions in oral histories. Nothing very compromising or personal, but this research experience made me think about the information trails we all leave behind and how easy they are to follow. Public information is much more public now than it was when Hammer was alive.
[2012-11-16 Sorry for this. I thought it was funny, and a lesson.]
Welcome to Simplish.
The best way to understand every english text with a knowledge of just 1’000 words.
AVNTK Translating tool colour code:
Words in Green mean they have been translated adequately.
Words in Purple display a further explanation if selected
Words in Blue contain two or more possible meanings. (A tooltip is provided for these words, place the mouse cursor on top of blue words to see possible meanings).
Words in orange are not currently available in Basic English
Words in Red are names, technical terms or not recognized by the translating tool.
Your Translation to Basic English took (1 secs ):
Enter the text you want to convert.
[translation into Basic English]
move into the wording you need to one who chaged beliefs.
Looked up the definition of ‘simple interpretation’ in RDF model theory again. I thought I’d put a picture here for future reference. This is very similar to Pat’s Figure 1.
I’ve drawn V, IR, and IP as disjoint (except for the containment of LV in other things), but there is nothing in the definition that requires them to be. Of course IR and Pow(IR x IR) have to be disjoint due to the foundation axiom, but Pow(IR x IR) might overlap with IP.
The branching arrow is meant to indicate that IS maps V to IR union IP, that is to say, each member of V goes to either a member of IR or to a member of IP (or both, where IP and IP overlap). According to the definition LV has to include all the plain literals that are in V, and perhaps V and IR intersect in other ways (URIs might be resources).
There’s nothing in the definition that requires IS to map onto IR, i.e. the model theory spec doesn’t define a “resource” to be “something that is identified by a URI” and doesn’t even require every “property” to be a “resource”.
[2012-09-10 I'm informed that I'm beating a dead horse; and should be gentle with TDWG. Agreed.]
(Continuing from the “tough URLs” post.)
The so-called “life sciences identifier” or LSID emerged as a candidate “persistent identifier” scheme in the early 2000s. The early hype led to uptake by a few projects. But the spec is an orphan: it failed to find a well-established sponsor such as a government, library, or viable standards organization, so there is no management or oversight of the namespace. This puts anyone using them (not those providing or creating them, but those actually depending on them) at risk, which of course will limit their uptake, creating a downward spiral.
LSIDs have two related difficulties. One is that they look like URNs, but do not have any standing with IETF, which oversees URN namespaces. (If ‘lsid’ were a URN namespace it would show up in the NID registry.) This makes the claim that LSIDs are a “standard” rather dubious. The practice of staking out territory without consent in a namespace managed by someone else is appropriately called “squatting”.
As a consequence of this I would strongly discourage anyone from putting urn:lsid:… where an RDF URI reference or XML namespace URI is required.
The other difficulty is that LSID claims to “persistence” have no basis. At present the “authority” field of an LSID is unmanaged, leaving open the possibilities of collisions and of fatal obscurity. People seem put a DNS domain name in this field, which is odd because the ephemeral nature of domain names was one of the reasons given for creating the LSID spec in the first place. Domain names can change hands over time, which can lead to collisions; and their resolvability is not assured through any credible social process, since backup resolution for a domain that goes silent is not at present either socially acceptable or well supported technically.
Registering a URN namespace isn’t hard; you just have to write an RFC draft and get IETF consensus. But persistence is a requirement for URN namespaces, so whoever’s submitting the RFC will have to make the case to IETF reviewers that LSIDs are persistent. This problem can be overcome as well, by designating a registry and establishing registration standards to forestall the creation of fly-by-night identifiers.
Registries seem to create a single point of failure, but for namespaces that are both valuable and understood by the community to be persistent, succession management can be taken care of. In the case of resolving existing names, if a namespace is understood to be persistent, then copies of it can be made without risk, so backups can serve if the original resolver disappears. The creation of new names is harder, since there has to be universal agreement on a control point. But again this can take care of itself, since competition tends to destroy the utility of the namespace, and those who care about this kind of thing know this.
As an alternative to a registry, consensus could be built that “the world is the registry” and LSID authority names should be durably publicized (published) and then never reused. This approach seems a bit shakier and perhaps less scalable than a coordinated registry, since assessing durability and publicity are arts. But this approach has worked pretty well for binomial species name nomenclature.
I was surprised to see that LSIDs are not just tolerated by TDWG (the Taxonomic Data Working Group), but promoted, front center (see TDWG home page). It’s unseemly behavior for a standards organization such as TDWG to promote something as rickety as LSIDs. In doing to it is thumbing its nose at a sister standards organization, IETF, by failing to respect their process.
Mine is not an anti-URN or pro-http: position; there is a correct idea to URNs and other persistent namespaces such as the MIME type registry and the .arpa top-level domain. Obviously nobody has enough control over the future to ensure that anything lasts forever, but simple agreement (or lack of disagreement) among all parties who could either care or who might exert control is the way social truths become established. Persistent naming is a confidence game. It can work if it’s seen as a speech act, similar to a declaration of independence or wedding ceremony, and everyone takes it seriously.
I’m not really in favor of making LSIDs work, and don’t want to argue the merits of URNs here, but that’s neither here nor there. I’m just urging anyone who is trying to use them to either make them solid, or cease and desist. “Standardness” and “persistence” don’t just happen; they require hard work.
(What I still need to write about: persistence of http: URIs.)
I’ve just discovered the writings of philosopher Jaroslav Peregrin, and came across a beautiful,
lucid article of his on meaning. It’s called What is
inferentialism? and I encourage anyone who cares about semantics, (machine) inference,
pragmatics, and empiricism to take a look at it.
Reading it I just wanted to cheer. Here is someone who (unlike me) is competent to
talk about these subjects, saying many things I have felt but have had difficulty articulating. Here are some teasers that I hope will inspire you to go read it:
…when I make an assertion, I commit myself to giving reasons for it
when it is challenged (that is what makes it an assertion rather than
just babble); and I entitle everybody else to reassert my assertion
reflecting any possible challenges to me.
First, inferentialism commits [the inferentialist] to a sentence holism, and so the
point of contact of language and the world cannot be on the word-object level, but rather
on the level sentence-situation or -action. Second, she is a normativist, hence she is not
interested in which responses in fact occur, but rather in which responses are correct.
I’ve been critical of objects and the idea of reference for a while now. To me sentences and propositions, by virtue of their role as “moves” in social interactions, are likely to have priority in a properly objective account of meaning. Many putative objects (e.g. corporations or mutable digital documents) border on being fictional, gaining their objecthood only through what we say about them; and many referring phrases seem to refer to different things, depending on what is being predicated. I think this opinion would make me what Peregrin calls a “strong inferentialist”.
Eventually I hope that thinking clearly about semantics ought to (among other things) help bring calm to the current mass hysteria which is the Semantic Web and Linked Data, and help steer all of that energy expenditure to improve its consequence.
Let’s now enter the fantasy world where “resource”, “identification”, and “representation of” have meanings consonant with what is found in the Web architecture documents RFC 3986, the HTTPbis draft (I’m looking at version 18), and AWWW. To make any sense of these documents the words have to be assumed to have something close to their ordinary language meanings (which are rather squishy), since they are otherwise effectively undefined.
1. Web architecture suggests that a URI owner is an authority for what is identified by its URIs (AWWW 188.8.131.52 bullet #2).
2. The HTTP protocol suggests the URI owner is an authority for what is a representation of what is identified (HTTPbis v.18 part 2 section 5.1 bullet 1 taken together with part 1 section 2.7.1).
If both kinds of authority hold, then Jabberwocky is a representation of the Magna Carta, since a URI owner can say both that the URI identifies the Magna Carta and that Jabberwocky is a representation of what is identified. But this is not true. How to resolve this paradox?
There are (at least) three solutions, based on modifying either of the two authority axioms.
1. We can say the URI owner is an authority for what is identified, but not for what is a representation of it. [2/11 I.e. solution = what the URI owner said was a representation is not a representation of what is identified.] In this case a 200 response only says that the payload is a rep, its arrival does not imply that it is a rep.
Accepting this would require modifying the HTTP protocol to say that the payload is only said to be a representation of the resource, not that it is. It is only nominally so.
2. We can say that the URI owner is an authority for representation, but that it is only an authority for identification to the extent that the identified resource is represented by any HTTP 200 responses that have been issued recently. [2/11 I.e. solution = the URI does not identify what the URI owner determined it to. It's not clear what identity authority would consist of independent of asserting representations, anyhow.]
Accepting solution 2 requires modifying the http: URI scheme to impose this limit on identification when representations are asserted, as this limit is otherwise not entailed by the URI scheme.
This could easily lead to the URI identifying nothing at all, which would be a problem.
2a. We can say that “representation” is redefined as a term of art, not used in an ordinary language sense. The URI owner has authority over representation, but the authority over identification is limited to having a URI identify mysterious sorts of things whose very nature allows some URI owner to have authority over their representations.
Accepting solution 2a also requires modifying the http: URI scheme, to restrict identification to these mysterious things when there are nominal representations. You might call these mysterious things, say, “information resources” (although this would run afoul of AWWW a bit).
Although I haven’t heard from him yet, my suspicion is that Roy Fielding would either insist on option 1, or insist that the problem doesn’t exist, while Tim Berners-Lee would either prefer option 2a or insist that the problem doesn’t exist. I want to get them into a room together to fight this one out, but I need to be there to make sure they don’t decide together that there is no paradox.
Further reading: speech acts
OK, there are two issues, one being what statements (triples) are needed in order to assert the waiver, the other being where to put them.
If there is a “landing page” for the ontology then CC Rel by Example gives a good start at documentation for what to do. It tells you the operative statement, which is
where xhv: abbreviates http://www.w3.org/1999/xhtml/vocab# .
Ideally you would assert this predicate and object for both the ontology (via its ontology URI) and the ontology version (if the version has its own URI), repeating for as many aliases as you know about. (Ontology versions are a particular feature of OWL 2, not of RDF.) You want to cover as many bases as you can. So you could end up with many statements like this.
Similarly, you want to put these statements in as many places as you can, not just the ontology file itself but also any landing page that it might have (as shown in RDFa in the ccrel-guide).
Putting statements into an RDF serialization (e.g. RDF/XML) is straightforward, as shown, if you are editing the serialization directly. But if you are using an OWL tool such as Protege, it
could be harder. Protege gives you two methods that might be used, ontology annotations and individual property assertions. You can use the ontology annotation pane to add as xhv:version to the ontology, but not the ontology version. To add individual property assertions for the ontology version you may have to put the three or more URIs in the ontology itself, which would just be tedious clutter, but I don’t see another choice.
Sadly all this work is speculative as there are no tools at present (of which I’m aware) that would pick up on the CC0 annotation. That’s not to say you shouldn’t do it, in fact I’m glad someone is willing to be a pioneer, as it will be a chicken-and-egg situation for quite a while.
In addition to expressing the waiver in RDF I would recommend writing a copyright statement in prose in an rdfs:comment ontology annotation property. The RDF statements themselves are likely to get lost or ignored, but with the rdfs:comment you have humans on your side. For wording you could use that given in the CC Rel guide or by the CC0 ‘chooser’ tool.
All of the above also applies if you’re attaching CC-BY or some other waiver or annotation, but ontologies are going to be easier to work with if they’re unencumbered, and the whole reason you wrote the ontology was so that it would be used, right?
Exercise for the adventurous reader: How does this approach fail if the httpRange-14 resolution‘s advice isn’t observed?
Thanks to Ruth Duerr for asking.