A few years ago I got interested in “knowledge representation” (which might more accurately but less catchily be called “information expression”) and this got me interested in the mechanics of semantics. I thought I knew pretty well how names, expressions, and statements got their meaning in programming languages and mathematical formalisms, but didn’t understand so well how meaning works in open-ended systems such as scientific discourse and the Internet, so did a bit of research. Below are some books that I thought interesting enough to buy. (I looked at lots of articles, too, see my Pinboard page.)
If I were to wait until I had something to say about all these books, that would be forever, so I’m just offering an unannotated list, in the spirit of Phil Agre’s somewhat longer list.
Jon Barwise and Jerry Seligman.
Information Flow: The Logic of Distributed Systems.
Cambridge University Press, 1997.
Philippe Besnard and Anthony Hunter.
Elements of Argumentation.
MIT Press, 2008.
Pragmatism and Reference.
MIT Press, 2009.
Radu J. Bogdan.
Predicative Minds: The Social Ontogeny of Propositional Thinking.
MIT Press, 2009.
MIT Press, 1997.
Martha I. Gibson.
From Naming to Saying: The Unity of the Proposition.
Truth, Meaning, Reality.
Oxford University Press, 2010.
Jeffrey C. King.
The Nature and Structure of Content.
Oxford University Press, 2007.
Wittgenstein on Rules and Private Language.
Harvard University Press, 1982.
Willard V. Quine.
Theories and Things.
Harvard University Press, 1981.
Philosophy of Language.
Princeton University Press, 2010.
I was cleaning out a file drawer this summer and came across some manuscripts that Preston Hammer thrust at me when I met him in about 1976. I remembered this conversation well; I was happy to encounter someone who cared about programming languages and foundations of mathematics as much as I did, and was impressed with his forceful manner and unorthodox opinions. So when I looked through the papers I thought: Nowadays there’s an Internet! Maybe I can learn a bit more about this odd fellow. I started doing some searches, and collected the fruit of what became an obsession for a day or two on a web page.
He led an interesting life. He did numerical programming at Los Alamos in the late 1940s and throughout the 1950s, and started the computer science departments at University of Wisconsin and Penn State. I think he was frustrated that his training in (continuous) mathematics didn’t prepare him for the computational tasks he was faced with, and in his research he focused on trying to resolve the differences between discrete and continuous mathematics. He cared strongly about pragmatic education in mathematics and computation, to the extent that he came into conflict with the mathematicians he had to work with.
Anachronistically, somebody created a LinkedIn profile and web site for him, but I can’t tell who.
The amount of detail I easily found on this man, who died six or seven years before the Web started to happen, was remarkable. Dates and places of birth and death, campus newspaper stories from the 1960s, unflattering mentions in oral histories. Nothing very compromising or personal, but this research experience made me think about the information trails we all leave behind and how easy they are to follow. Public information is much more public now than it was when Hammer was alive.
[2012-11-16 Sorry for this. I thought it was funny, and a lesson.]
Welcome to Simplish.
The best way to understand every english text with a knowledge of just 1’000 words.
AVNTK Translating tool colour code:
Words in Green mean they have been translated adequately.
Words in Purple display a further explanation if selected
Words in Blue contain two or more possible meanings. (A tooltip is provided for these words, place the mouse cursor on top of blue words to see possible meanings).
Words in orange are not currently available in Basic English
Words in Red are names, technical terms or not recognized by the translating tool.
Your Translation to Basic English took (1 secs ):
Enter the text you want to convert.
[translation into Basic English]
move into the wording you need to one who chaged beliefs.
Looked up the definition of ‘simple interpretation’ in RDF model theory again. I thought I’d put a picture here for future reference. This is very similar to Pat’s Figure 1.
I’ve drawn V, IR, and IP as disjoint (except for the containment of LV in other things), but there is nothing in the definition that requires them to be. Of course IR and Pow(IR x IR) have to be disjoint due to the foundation axiom, but Pow(IR x IR) might overlap with IP.
The branching arrow is meant to indicate that IS maps V to IR union IP, that is to say, each member of V goes to either a member of IR or to a member of IP (or both, where IP and IP overlap). According to the definition LV has to include all the plain literals that are in V, and perhaps V and IR intersect in other ways (URIs might be resources).
There’s nothing in the definition that requires IS to map onto IR, i.e. the model theory spec doesn’t define a “resource” to be “something that is identified by a URI” and doesn’t even require every “property” to be a “resource”.
[2012-09-10 I'm informed that I'm beating a dead horse; and should be gentle with TDWG. Agreed.]
(Continuing from the “tough URLs” post.)
The so-called “life sciences identifier” or LSID emerged as a candidate “persistent identifier” scheme in the early 2000s. The early hype led to uptake by a few projects. But the spec is an orphan: it failed to find a well-established sponsor such as a government, library, or viable standards organization, so there is no management or oversight of the namespace. This puts anyone using them (not those providing or creating them, but those actually depending on them) at risk, which of course will limit their uptake, creating a downward spiral.
LSIDs have two related difficulties. One is that they look like URNs, but do not have any standing with IETF, which oversees URN namespaces. (If ‘lsid’ were a URN namespace it would show up in the NID registry.) This makes the claim that LSIDs are a “standard” rather dubious. The practice of staking out territory without consent in a namespace managed by someone else is appropriately called “squatting”.
As a consequence of this I would strongly discourage anyone from putting urn:lsid:… where an RDF URI reference or XML namespace URI is required.
The other difficulty is that LSID claims to “persistence” have no basis. At present the “authority” field of an LSID is unmanaged, leaving open the possibilities of collisions and of fatal obscurity. People seem put a DNS domain name in this field, which is odd because the ephemeral nature of domain names was one of the reasons given for creating the LSID spec in the first place. Domain names can change hands over time, which can lead to collisions; and their resolvability is not assured through any credible social process, since backup resolution for a domain that goes silent is not at present either socially acceptable or well supported technically.
Registering a URN namespace isn’t hard; you just have to write an RFC draft and get IETF consensus. But persistence is a requirement for URN namespaces, so whoever’s submitting the RFC will have to make the case to IETF reviewers that LSIDs are persistent. This problem can be overcome as well, by designating a registry and establishing registration standards to forestall the creation of fly-by-night identifiers.
Registries seem to create a single point of failure, but for namespaces that are both valuable and understood by the community to be persistent, succession management can be taken care of. In the case of resolving existing names, if a namespace is understood to be persistent, then copies of it can be made without risk, so backups can serve if the original resolver disappears. The creation of new names is harder, since there has to be universal agreement on a control point. But again this can take care of itself, since competition tends to destroy the utility of the namespace, and those who care about this kind of thing know this.
As an alternative to a registry, consensus could be built that “the world is the registry” and LSID authority names should be durably publicized (published) and then never reused. This approach seems a bit shakier and perhaps less scalable than a coordinated registry, since assessing durability and publicity are arts. But this approach has worked pretty well for binomial species name nomenclature.
I was surprised to see that LSIDs are not just tolerated by TDWG (the Taxonomic Data Working Group), but promoted, front center (see TDWG home page). It’s unseemly behavior for a standards organization such as TDWG to promote something as rickety as LSIDs. In doing to it is thumbing its nose at a sister standards organization, IETF, by failing to respect their process.
Mine is not an anti-URN or pro-http: position; there is a correct idea to URNs and other persistent namespaces such as the MIME type registry and the .arpa top-level domain. Obviously nobody has enough control over the future to ensure that anything lasts forever, but simple agreement (or lack of disagreement) among all parties who could either care or who might exert control is the way social truths become established. Persistent naming is a confidence game. It can work if it’s seen as a speech act, similar to a declaration of independence or wedding ceremony, and everyone takes it seriously.
I’m not really in favor of making LSIDs work, and don’t want to argue the merits of URNs here, but that’s neither here nor there. I’m just urging anyone who is trying to use them to either make them solid, or cease and desist. “Standardness” and “persistence” don’t just happen; they require hard work.
(What I still need to write about: persistence of http: URIs.)
I’ve just discovered the writings of philosopher Jaroslav Peregrin, and came across a beautiful,
lucid article of his on meaning. It’s called What is
inferentialism? and I encourage anyone who cares about semantics, (machine) inference,
pragmatics, and empiricism to take a look at it.
Reading it I just wanted to cheer. Here is someone who (unlike me) is competent to
talk about these subjects, saying many things I have felt but have had difficulty articulating. Here are some teasers that I hope will inspire you to go read it:
…when I make an assertion, I commit myself to giving reasons for it
when it is challenged (that is what makes it an assertion rather than
just babble); and I entitle everybody else to reassert my assertion
reflecting any possible challenges to me.
First, inferentialism commits [the inferentialist] to a sentence holism, and so the
point of contact of language and the world cannot be on the word-object level, but rather
on the level sentence-situation or -action. Second, she is a normativist, hence she is not
interested in which responses in fact occur, but rather in which responses are correct.
I’ve been critical of objects and the idea of reference for a while now. To me sentences and propositions, by virtue of their role as “moves” in social interactions, are likely to have priority in a properly objective account of meaning. Many putative objects (e.g. corporations or mutable digital documents) border on being fictional, gaining their objecthood only through what we say about them; and many referring phrases seem to refer to different things, depending on what is being predicated. I think this opinion would make me what Peregrin calls a “strong inferentialist”.
Eventually I hope that thinking clearly about semantics ought to (among other things) help bring calm to the current mass hysteria which is the Semantic Web and Linked Data, and help steer all of that energy expenditure to improve its consequence.
Let’s now enter the fantasy world where “resource”, “identification”, and “representation of” have meanings consonant with what is found in the Web architecture documents RFC 3986, the HTTPbis draft (I’m looking at version 18), and AWWW. To make any sense of these documents the words have to be assumed to have something close to their ordinary language meanings (which are rather squishy), since they are otherwise effectively undefined.
1. Web architecture suggests that a URI owner is an authority for what is identified by its URIs (AWWW 188.8.131.52 bullet #2).
2. The HTTP protocol suggests the URI owner is an authority for what is a representation of what is identified (HTTPbis v.18 part 2 section 5.1 bullet 1 taken together with part 1 section 2.7.1).
If both kinds of authority hold, then Jabberwocky is a representation of the Magna Carta, since a URI owner can say both that the URI identifies the Magna Carta and that Jabberwocky is a representation of what is identified. But this is not true. How to resolve this paradox?
There are (at least) three solutions, based on modifying either of the two authority axioms.
1. We can say the URI owner is an authority for what is identified, but not for what is a representation of it. [2/11 I.e. solution = what the URI owner said was a representation is not a representation of what is identified.] In this case a 200 response only says that the payload is a rep, its arrival does not imply that it is a rep.
Accepting this would require modifying the HTTP protocol to say that the payload is only said to be a representation of the resource, not that it is. It is only nominally so.
2. We can say that the URI owner is an authority for representation, but that it is only an authority for identification to the extent that the identified resource is represented by any HTTP 200 responses that have been issued recently. [2/11 I.e. solution = the URI does not identify what the URI owner determined it to. It's not clear what identity authority would consist of independent of asserting representations, anyhow.]
Accepting solution 2 requires modifying the http: URI scheme to impose this limit on identification when representations are asserted, as this limit is otherwise not entailed by the URI scheme.
This could easily lead to the URI identifying nothing at all, which would be a problem.
2a. We can say that “representation” is redefined as a term of art, not used in an ordinary language sense. The URI owner has authority over representation, but the authority over identification is limited to having a URI identify mysterious sorts of things whose very nature allows some URI owner to have authority over their representations.
Accepting solution 2a also requires modifying the http: URI scheme, to restrict identification to these mysterious things when there are nominal representations. You might call these mysterious things, say, “information resources” (although this would run afoul of AWWW a bit).
Although I haven’t heard from him yet, my suspicion is that Roy Fielding would either insist on option 1, or insist that the problem doesn’t exist, while Tim Berners-Lee would either prefer option 2a or insist that the problem doesn’t exist. I want to get them into a room together to fight this one out, but I need to be there to make sure they don’t decide together that there is no paradox.
Further reading: speech acts
OK, there are two issues, one being what statements (triples) are needed in order to assert the waiver, the other being where to put them.
If there is a “landing page” for the ontology then CC Rel by Example gives a good start at documentation for what to do. It tells you the operative statement, which is
where xhv: abbreviates http://www.w3.org/1999/xhtml/vocab# .
Ideally you would assert this predicate and object for both the ontology (via its ontology URI) and the ontology version (if the version has its own URI), repeating for as many aliases as you know about. (Ontology versions are a particular feature of OWL 2, not of RDF.) You want to cover as many bases as you can. So you could end up with many statements like this.
Similarly, you want to put these statements in as many places as you can, not just the ontology file itself but also any landing page that it might have (as shown in RDFa in the ccrel-guide).
Putting statements into an RDF serialization (e.g. RDF/XML) is straightforward, as shown, if you are editing the serialization directly. But if you are using an OWL tool such as Protege, it
could be harder. Protege gives you two methods that might be used, ontology annotations and individual property assertions. You can use the ontology annotation pane to add as xhv:version to the ontology, but not the ontology version. To add individual property assertions for the ontology version you may have to put the three or more URIs in the ontology itself, which would just be tedious clutter, but I don’t see another choice.
Sadly all this work is speculative as there are no tools at present (of which I’m aware) that would pick up on the CC0 annotation. That’s not to say you shouldn’t do it, in fact I’m glad someone is willing to be a pioneer, as it will be a chicken-and-egg situation for quite a while.
In addition to expressing the waiver in RDF I would recommend writing a copyright statement in prose in an rdfs:comment ontology annotation property. The RDF statements themselves are likely to get lost or ignored, but with the rdfs:comment you have humans on your side. For wording you could use that given in the CC Rel guide or by the CC0 ‘chooser’ tool.
All of the above also applies if you’re attaching CC-BY or some other waiver or annotation, but ontologies are going to be easier to work with if they’re unencumbered, and the whole reason you wrote the ontology was so that it would be used, right?
Exercise for the adventurous reader: How does this approach fail if the httpRange-14 resolution‘s advice isn’t observed?
Thanks to Ruth Duerr for asking.
Henry Thompson and I have been puzzling for a few years over the question of why the Web doesn’t have URIs that are widely perceived as both robust – in the sense of resisting attacks such as expiration, corruption, and censorship – and actionable – in the sense of just working in the browser. We have identifiers systems that are one or the other, but not both – why?
The robust identifier systems that we have range from pre-Internet ones like Linnaeus’s binomial species names (which are tied to their priority literature reference), the chemical element symbols, ISSN, and so on, to modern inventions such as URNs, info: URIs, and the digital object identifier (DOI). Our actionable identifiers (or locators?) are things like http: and ftp: URIs – a notably disjoint set.
Why should anyone care about robust actionable URIs? The reason is that, if they existed, they would marry a cornerstone of civil discourse
with to the central modern communication technology, namely the Web.
We take robust reference for granted in everyday civil, legal, scientific, technical, and political discourse, so much so that it is not even called out as a named phenomenon. If you’re debating a law or a scientific article with someone, the last thing you want is for your argument to go wrong because the two sides are working from different documents – especially if the difference goes undetected. This would be stupid.
But reliable reference was not always the rule. It took the world hundreds of years following the invention of the printing press to deal with this problem. Now we are repeating the
reference chaos of reference in the early print world on modern technology.
References are easy to deal with if you are a human, speaking natural language, with a bit of time on your hands. If you see the species name Rana pipiens and know a little bit about how species names work you can look it up to get the primary reference for that name. Each identifier system has its own set of resolution services, many of them on the Web and open. But informal references in dozens of different identifier systems is not the same as being first-class citizens on the Web – as I say human intervention is required. Making references accessible to computers using ordinary (i.e. Web) protocols vastly accelerates any process that needs to follow them. And to do this, today, you need something that starts with http:// and a domain name.
By now you have no doubt found many ways to poke holes in what I’ve said so far. Are “tough” URIs really possible? What exactly could that mean? Isn’t it impossible to eliminate all vulnerabilities? On the other hand, given that the examples of robust mostly are, isn’t a URI such as http://dx.doi.org/10.1155/1987/47105 a counterexample to my claim that we don’t have robust actionable URIs? And if this is such a problem, why on earth hasn’t it been solved already? Is it inherently intractable or is this some kind of awful techno-social mistake that can be fixed?
What interests me is a sweet spot in between these two extremes: more robust that current-day doi.org URIs, but admitting the unavoidable inevitability of certain vulnerabilities.
OK, I have more to say about threat analysis, IDF, ICANN, P2P, and so on, and will do so in a followup. In the meantime – if you want to talk about this, please come to our workshop in Bristol, UK, on December 8th!
[Minor copy edits on 2012-08-02]
Do researchers prefer to publish in closed access journals rather than open access in order to avoid OA publication charges? Librarians would certainly prefer they publish open access, since OA reduces their costs (in the long run) by reducing their subscription burden. I don’t know the answer, but if this is happening, universities might take steps to eliminate the incentive.
Here’s an idea: For each closed access article published, the university assesses a “subscription tax” comparable to what would have been assessed had the article been published open access, or maybe higher. That is, you can continue to support the subscription model, but you’ll have to pay for it, just as those publishing open access have to pay for open access.
The subscription tax goes to the libraries and is used to pay for subscriptions. The university overhead rate can be reduced, for everyone, by the amount raised through this new revenue stream.
Professor Smith’s decision between closed and open can now be made without financial bias. Currently she’ll pay (from her grant) $50,000 overhead plus $1,500 for open access or $0 for closed access. This would change to $48,500 overhead plus $1,500 for open access or $1,500 tax for closed access. Her grant officer is happy because the switch from overhead to article charge-or-tax is dollars-neutral, Smith is happy because she doesn’t need to factor open/closed into her venue decision, and the librarian is happy because more OA publishing is taking place and subscription load is dropping.
I know my arithmetic probably doesn’t come out right, but I hope you get the idea.
I’m sure someone has already thought about this, but this is a blog so I get to write things like the above without bothering to do background research.