Why is Open Tree not publishing RDF?

Question raised on an Open Tree discussion group:

I’m wondering why you are not using RDF as the underlying graph data model and OWL annotations (and other existing ontologies) to create a semantic graph and therefore following the current best practices to build knowledge graphs.

Good question. Partly it’s that only one person on the project knows anything about RDF. But I think this is mainly a matter of cognitive space and time among the developers, and priorities. If we felt a need to do it given the goals that we have, we would probably do it. But we haven’t felt any need.

Converting to RDF and OWL is easy to do poorly (and perhaps adequately for many purposes). One of the first things I did on the project was to convert the taxonomy to turtle so I could load it into a triple store. (I was on the RDF bandwagon for many years.) Anyone could do this; it’s a trivial script. Also, the NeXML format that we use subsumes RDFa Core so can be converted easily – in a sense we *do* publish RDF for the study database.

Doing RDF/OWL well is much harder, and would require cooperation with other groups such as OBO (IAO, VTO, …) and TDWG, choice of and support for persistent URLs, good term definitions and documentation, SPARQL endpoint, and so on. These coordination activities are extremely time consuming. Of course doing so would be lovely in the abstract, but there has been no reason for us to make this a priority.

In my experience, format conversion is by far the easiest activity in data ecology, so mere conversion to RDF has little value. The hard parts are marshalling the data in the first place, and then using it wisely. Due to the vagueness of most vocabulary term definitions, the best laid RDF usually requires as much reverse engineering and postprocessing as data in any other format when doing data integration and analysis. So it is semantics, not syntax, where the effort is best spent. (RDF being a syntactic play, and not helping with semantics any better than any other data format, in spite of the buzzword “semantic web”. OWL helps semantics a little but only with inference, not with ground truth, which is what really matters.)

The feedback captured in the feedback system (in github) has a little structure, and we could probably do better in obtaining more.

The thing that would tip the balance would be a real funded collaboration with another project where there was good reason to use RDF or OWL for communication between the collaborators. Publishing RDF/OWL merely for the sake of doing so is not in my opinion the best use of resources – especially given that all the information is open and anyone else could do such a conversion for us. I read a lot about the size of the linked data cloud, but very little about its utility. I bet there are legitimate uses of RDF-published data, but from what I’ve seen people mostly publish RDF just so that they can say that they did, not because they know that someone needs it. (Would love to be shown otherwise.)

How would having RDF for open tree make a difference to you, personally?

Categories: Uncategorized

Direction of tree growth

As someone with computer science degrees who is working on an evolutionary biology project, I have to be constantly vigilant about tree-growth direction confusions. Just now I found the following sentence in an article in Algorithmica:

For v, w nodes in T, we say that v lies below w if the path from v to the root of T passes through w.

Now real trees are oriented with their root(s) at the bottom, the trunk in the middle, then the branches, and the leaves (or needles) at the very top. If v is a leaf or branch, how can it lie below something that’s on the path from v to the root?

Maybe we should picture a hook-shaped or umbrella-shaped tree, with its trunk shooting up and all of its branches and leaves hanging down from the top of the trunk. There are trees like that, I think. Or, a hanging vine or epiphyte, growing downward from the spot where it’s planted. Then v could be below w with w on the path from v to the root. (Hmm, I don’t think an epiphyte would grow down; the whole point of their plant-on-tree adaptation is to obtain sunlight, which of course comes from above.)

Drawing trees sideways is a neutral solution to make life equally difficult for both cultures, and you see a lot of phylogenetic trees drawn this way in the literature.

The phylogenetics folks on the project speak of one node being ‘deeper’ than another. It took me a while to figure this out but their usage is in agreement with real trees if you imagine them submerged, as you’d see in the forests near the mouth of the Amazon, the ones that have frugivorous fish. Of course this is contrary to the way ‘depth’ is used in computer science. When computer scientists talk about depth-first search, they mean to start at the leaves and go toward the root.

How did trees get flipped upside down like this? I think it comes from sentence diagramming, where by convention all the trees are drawn upside down. I would guess the custom found its way from sentence diagramming to computer science via Chomsky, who was very influential in the early days of CS, probably more so than, say, Ernst Mayr (see figure here to see how he drew them).

Added 2015-07-30:

1. In lattice theory one speaks of lower and upper bounds, and top and bottom elements. One interpretation of a lattice is as a family of sets, and when this is done usually the bigger sets go toward the top and the smaller ones toward the bottom. This is reflected in the usual v-like symbol for least upper bound or “join”, which reminds me of the u-like set union symbol, and greatest lower bound or “meet”, which looks like intersection. (By duality you could treat everything the opposite and the theory would all still work.) If you think of taxa being set-like, this puts the small taxa at the bottom and the large ones at the top. This is the opposite of what the biologists would prefer.

People who work on the mathematics of phylogenetic trees often appeal to the theory of upper semilattices, which being a flavor of lattice theory puts the root of the tree at the top, so they will have at least as much disorientation risk as I do.

2. In traditional taxonomies there is the notion of ‘higher’ and ‘lower’ taxonomic rank. The ‘higher’ ones, like kingdom, are the ones closer to the root of the taxonomic tree, and the ‘lower’ ones like genus are closer to the tips. This inverted orientation comes from applying a different metaphor, one incompatible with trees. The image this conjures for me is medieval power structures where the more powerful you are the higher your elevation. The higher you are, the better you can be heard (to command), the further you can see (for intelligence gathering), and the better positioned you are for waging war. So even within biology there is no consistency.

[Added 2016-03-12: good discussion of tree orientation on Tufte’s site; study on effect of tree layout on comprehension. Thanks Jim Allman!]

Categories: Uncategorized

When does x refer to y?

I have been concerned about the situation where a claim of the form ‘x refers to y’ is to be tested, perhaps because it is a requirement of a specification and one wants to see whether an engineered artifact (specifically a language-using agent) conforms to the specification. Claims of reference appear, on the surface, to require introspection, which is not generally something you do in an engineering context. What experiments or analysis do you perform (on an agent) to see whether the claim might hold, or not? Recognizing of course that in engineering, as in science, there is no proof, only absence of disproof.

Knowledge representation naysayers and semantic web pooh-poohers are in effect saying that talk of meaning and reference is not objective – it does not belong in science or engineering. I wonder if the failings of KR and semweb are not because they are inherently ill-founded, feeble, or intractable, but rather are due to inadequate understanding of meaning and reference, and consequent poor execution.

The question – how do you tell whether x refers to y? – was central to my puzzlement over W3C TAG issue httpRange-14 when I was involved with the TAG. Any answer to the question would seem to put a requirement on whether and how a URI refers.

I’ve argued here and in other posts (I repeat a lot) that it is possible to test claims of the form ‘s means p’ where s is a sentence and p is a proposition. This is because, in contrast to referring phrases (x above), there is an observable connection between the sentence being said, and certain states of affairs in the world. (Imperatives such as ‘complying with s leads to p’ work the same way.) Put briefly, s means p, if {s might be said} if and only if p.

I tried saying that x can refer to any y that has the property that every sentence of the form k∙x means the proposition p(y), where p is the meaning of the predicate phrase k. This is ugly and creates a circularity, since it would seem that assaying the meaning of x would require assaying the meaning of various p’s, which would require assaying the meaning of various x’s, etc. One might use this formulation to look for relative meaning of referring phrases and predicate phrases, but not for any independent statement of meaning of phrases (of the sort one can make for sentences). I acknowledge that relative meaning is more or less what model theory advances, but it seems counterintuitive to me. We argue about what a word means; we don’t seem to argue about what one word means relative to others.

(I write k∙x to denote the sentence composed from predicate phrase k and referring phrase x.)

What I recently noticed is that to test reference you don’t need to know what predicate phrases mean, only what sentences that contain them mean. I propose the following:

   x refers to y, if every sentence k∙x means a proposition that is about only y.

This proposal has a gazillion qualifiers.

  • ‘k∙x is about only y’ means that the truth of k∙x is affected only by (the state of) y; a change to something else that doesn’t affect y can’t change the truth of k∙x.

  • Not all sentences mean, so I’d want to change “every sentence k∙x” to “every meaningful sentence k∙x”. I left the word out to avoid clutter.

  • If a sentence has two referring phrases x and x′, then the proposition that the sentence means is ‘about only’ a combination of the two things that x and x′ refer to.

  • Sentences can mean propositions whose truth value is affected by variables not referenced in the sentence. ‘Grue‘ is the classical example, but ‘highly rated’ is similar (it is not said who is doing the rating). As a patch I would say that the languages under analysis would have to forbid such predicates, or else would have to be translated into some second language lacking them.

  • It is possible that two distinct subjects / entities / referents change their state exactly in tandem, in which case looking for patterns of change would not be enough to tell them apart. One example might be the two propositions p and not p. I suspect there are others, but there enough cases where a subject is adequately determined by its state space that I don’t consider this a fatal flaw.

  • The proposal may fail to uniquely ‘identify’ some intended y as the referent, in that applying all possible predicate phrases k to x could yield propositions all of which are about only some y’ that has ‘fewer’ states than x (i.e. the state space of y, considered as a partition of the world state space, might be a refinement of that of y’). That is, distinctions between certain states of y cannot be expressed in the language under consideration. – If this is the case, ways out would include: to consider the language to be deficient; to consider y to be a pathological or disallowed subject; to take the proposal to be a definition of reference; or to argue that the distinction between x and x’ cannot make a difference to whether an agent meets any specification.

  • The proposal may also fail to uniquely determine y if candidate referents can differ in ways other than in what doesn’t matter to them, i.e. other than in how their state spaces partition the world state space. After Yablo, I find the idea that subjects (or subject matters) are iso-ontic with their world-state-space partitions to be appealing, and while there are a few things about it that I don’t completely get, I’m sticking with it for the time being.

  • Deciding whether any given change to the world constitutes a change to some given y is by no means a science. This would be a negotiation between what is meant (at the meta-level) by the world state space, and what is meant by y.

  • Indefinite reference will require additional machinery or handwaving.

  • To broaden applicability we can interpret ‘change’ (i.e., differences between points in the world-state-space) broadly: not just as change in the physical world through time, but ‘motion’ through any kind of state-like set, such as possible contents of a document, possible identities, possible worlds, and so on. Not that I suggest a free for all, but that I don’t want to lose the framework on account of it appearing to be too narrow or rigid.

  • Obviously all the richness of human language is being put aside.

With apologies to Leibniz, Yablo, and the usual cast of characters (you know who you are).

More to come, I hope – this idea will require testing and elaboration.

Categories: Uncategorized


That last post had a bug in it, so I withdrew it. Sorry. Email me if you want to see it. I don’t think the flaw is fatal, and I hope to have a revised version up later.

[Later that day: Reinstated under new title.]

Categories: Uncategorized

Taxonomy and speech acts

Random thoughts on taxa. Written more for my non-biologist friends interested in semantics, but I’d be interested in critiques from taxonomists.


We classify in order to generate hypotheses by induction (i.e. prejudice). If members of a class C generally have property P, and an individual a is classed in C, then it might be a good bet that a has property P.

(I’m using ‘class’ in the sense of formal logic, not in the sense of the Linnaean rank.)

The problem of classification seems straightforward. You articulate a set of classes, each of which has a rule for determining membership. When a new individual comes along and you want to classify it, you test its properties using each class’s membership rule, and the result is a set of classes to which it belongs.

When classification is to become a community practice, names or phrases must be used to identify classes (so you and I can try to talk about the same class) and language has to be used to communicate membership rules. To come up with the same answers to a given membership question, or to be able to engage in evidence-based arguments, you and I have to have compatible interpretations of the tokens we use to communicate. By ‘compatible’ I mean in the sense of non-linguistic consequences, e.g. what observations or experiments we do in order to test some sentence under consideration (examining the properties of an individual to be classified, and so on).

A membership rule could be in terms of directly testable physical properties. But sometimes membership rules are not directly testable, such as rules involving descent. If the rule is that x is in class C if it descends from something having a given physical property, or that x is in C if x descends from the most recent common ancestor of y and z, then testing membership of a given x in C can be tricky. One has to evaluate potentially competing theories of descent, and the answer could be difficult to resolve.

The specification in language of membership rules can easily be incomplete, vague, or ambiguous, so that different people might judge membership differently depending on how they interpret the language of the rule. More likely, someone will detect the difficulty with a rule and refuse to interpret it one way or the other in those cases where it is unclear.


In biological taxonomy, a class’s membership rule is called a ‘circumscription’, and a class is called a ‘taxon concept‘. However, often people speak of a taxon rather than a class. It turns out that taxa and classes are very different from one another.

It is unclear what kind of thing a taxon is. It seems to be agreed that a taxon can change over time, i.e. a taxon can be connected with class c1 at time t2, and class c2 at time t2. Taxa are like houses, which change over time by being repaired, painted, and so on.

For example, the name Hyla denotes a taxon that formerly included the little frogs known as spring peepers, but no longer does.

It is not enough to say that a taxonomic name is interpreted to denote different classes at different times or in different contexts. This is because a taxon might be known by two names, or by different names at different times. The changing taxon really seems to be independent of its name, and is connected to sequence of classes over time.

So taxa are neither physical nor “abstract”, under the usual idea that abstract entities are eternal and don’t change.

There are similarly strange entities in other domains. Of digital documents we say that Alice changed ‘the document’ and sent it to Bob, who made another change. Alice’s ‘copy of the document’ no longer reflects the current state of the document after Bob’s change. A group of people editing a digital document somehow agree on what the contents of ‘the document’ are at any given time, because they are aware of how authority over the document is transferred, delegated, partitioned, and so on. But the document has no fixed physical location and may not even have a fixed name.

The bylaws of a corporation are similar. They are amended over time, and we don’t say that the corporation has new bylaws, we say that the bylaws have changed. We say that the bylaws used to say X, but now they say Y.

I suggest that taxa and digital documents, like promises, marriages, bylaws, contractual agreements, and so on, are the products of speech acts: the truth of their creation and their state changes are effected socially, through special kinds of communication. An unusual feature of taxa, compared to most speech-act products such as promises and marriages, is that they are referred to by name. The association of a name to a taxon is itself established by a speech act, similar to a christening. Promises and marriages can be created and changed and referred to, but they are not generally given names.

I’m sort of glossing over a complication, namely that the circumscription can stay the same, e.g. “conspecific with specimen X”, while theories about equivalent physical-property-based circumscriptions change over time, and in addition theories about what “conspecific” means change over time. We often see new publications for species that do not change the underlying definition “conspecific with specimen X” but do put forth some new theory of how to identify things that are. The pragmatic effect of this is that the taxon has changed, since the secondary or predictive circumscription has changed, although in some deeper sense it hasn’t.


What benefit does the community get through this level of indirection (name -> changing taxon -> class vs. name -> class)? Why not use different names for different classes?

In fact there are situations where taxonomists are careful to do so. References to classes take the form taxon-name sensu authority-reference, where the authority-reference (usually author + year) refers to a particular publication that lays out a particular circumscription. Taxa are therefore left out of the picture.

Taxonomists are responsible for taxon change – they perform “nomenclatural acts” which are similar in nature to the speech acts that create promises and marriages. Some acts create taxa, some change them for various reasons, and others reflect reclassification (the same taxon being placed in a different higher group). A taxon changes from association with c1 to c2 when an author judges c2 to be “better” than c1. Goodness might be judged, for example, according to whether c2 is thought more likely than c1 to be a clade (i.e. united by common descent), or considered better delimited than c1. Better delimitation could be either more precise description, or by being better biology, e.g. a better match to character discontinuities in natural populations.

At any given time a taxon has an associated class, and it also has a unique associated name. Nomenclatural speech acts change the name of a taxon. Some taxa are given a dozen or more names over their lifetime. (I don’t want to go into the rules of taxonomic nomenclature, as they’re explained elsewhere, e.g. here.)

I can only speculate on what’s going on here.

  1. Taxon names are shorter and easier to remember than class names. Most of the time the circumscription doesn’t change very much as the taxon changes, and people don’t get into much trouble by failing to specify which circumscription they mean. (Sometimes they do get into trouble, of course.) Most biologists consider the circumscription to be noise and leave it out, and they are often justified because many taxa have so far only had a single circumscription.
  2. Taxon names are good search keys when looking for biological information on an individual or population. What we know about individuals in class c1 may also apply to individuals in class c2, if they have both been associated with the same taxon.
  3. Replacing c1 with c2 for a taxon t with name n robs class c1 of a its association with n (via t), since c2 has taken over name n. This sends a strong message to the community not to use class c1 any more in classification.

These reasons may have made sense historically, but I don’t think they make for good science today. Now that we have an Internet, anyone ought to be able to look up a circumscription given a reference to it and figure out whether a given specimen satisfies it.

The advent of DOIs, and the increasing number of authority publications that have them, make them ideal as authority references.

The technical problems of using class references instead of taxon references are easy enough to solve. The hard part, obviously, is overcoming inertia and getting any sort of support for reform from biologists and publishers.

Real change

Membership in a class does change over time as a result of births and deaths. An individual that might belong to C at time t would probably not be said to belong to C before it was born or after it died. But because the class has a circumscription, we do not generally say that the class itself has changed, only that its membership has.

We can also speak of changing populations. Suppose that C is the class of members of population P. We might establish that at time t, an individual x is a member of C if and only if it is a member of a class D with a circumscription based on physical features. However, the population P can evolve, so that at some later time t’, an individual x’ might be in C but not in D.

Populations are physical entities, and like houses, they can change. The properties of members of C (in general) may have changed over time, but C itself (its membership rule) has not because it is based on a changeable physical entity.

These are completely different kinds of change than taxon change. What’s in common is that the truth of propositions that involve a class or taxon can change over time. But while class membership changes because the biological world changes, taxon membership changes because of speech acts.


I’m parodizing, a little bit, a world view in which we have lots of separate things and changes are localized to things in an orderly way. I’ve tried to explain taxa by following the logical consequences of the way biologists talk. I’m not satisfied that this is right or that there isn’t a better way to understand fictitious entities like taxa. One is tempted either to eliminate such entities, and thereby to remove speech acts from one’s understanding of the world (perhaps moving the locus of change inside ourselves), or to look for speech-act-nature in all ‘things’ and consider that there may be authority structures in all discourse.

This post draws on conversations with Brian Cantwell Smith, Allen Renear, and Henry S. Thompson.

Categories: Uncategorized

No frameworks please

Response to Peter Wayner’s recent article at cio.com. I’m not responding on the site because login to the site demands way too much in the way of rights to my social media accounts (e.g. it wants to be able to post to Twitter under my name). The article was brought to my attention via the dspace-tech list and I knew the author way back when.

Hi Peter, long time no see. You say that proliferation of programming languages was a bad thing; that frameworks are the modern equivalent of programming languages; and that the proliferation of frameworks is a good thing. How does this make any sense? I agree that a lot of the creative juice that used to go into programming languages now goes into frameworks, and that once you get past syntax they are basically the same. The problem with frameworks, as with programming languages, is that they don’t combine in any meaningful way. You have to commit to one of them within any given address space. To combine them you have to resort to interprocess calls, which these days usually means HTTP requests, often with a huge performance hit. Frameworks mean lock-in and rigidity. The right approach to software architecture is not frameworks, but libraries and toolkits. These interoperate nicely with one another and allow you to compose new artifacts from existing ones, recursively. Interoperation and composability are supposed to be the dividend we reap by standardizing on a programming language; frameworks throw that dividend away.

Categories: Uncategorized

Dual licensing and exclusive rights

Ross Mounce asks:

How does dual licencing work when one license flatly contradicts the other e.g. CC-BY vs Elseviers “exclusive right to publish & distribute”

Here‘s the problem he’s referring to:

“Elsevier is granted the following rights:

1. The exclusive right to publish and distribute an article, and to grant rights to others, including for commercial purposes. …”

This is indeed very confusing. When the work is first created, the author has (by copyright law) exclusive right to publish, distribute, and grant rights. Then the author makes an agreement with Elsevier that (a) grants Elsevier exclusive rights to do these things – which I take to mean that, they, the author, are excluded from doing these things (and everyone else is too, but they were already excluded by copyright law). But author and Elsevier also made an OA agreement that (b) requires Elsevier to distribute the work under a CC license. Once Elsevier does (b), as required, it no longer has an exclusive right to publish and distribute, because it has granted those rights to others (which it was able to do since the author grants the right to grant rights). (The grant is conditional on attribution, etc.) The wording “grant the exclusive right to publish” is very confusing since Elsevier by OA agreement is required to turn right around and relinquish some of that exclusivity.

Elsevier might retain the right to publish and distribute in ways other than what they licensed, e.g. without attribution, or without a statement of the CC license. That is, it is free to dual license, while the author is excluded (by contract) from dual licensing. But that in no way negates the CC license, which is irrevocable.

But I don’t know what the OA part of the agreement looks like. If it doesn’t say that every copy that Elsevier makes must carry the agreed CC license, it’s not worth very much as an OA agreement, since the door would be open (legally) to dubious practices such as the one you observed: they can meet the OA agreement by publishing with the CC license for a year, say, and then remove the license statement. Then people who don’t know about the CC license will be tricked into paying them money, even though they don’t need to. (The absence of a license notice does not imply the absence of a license.)

To answer your question: (1) The statement about exclusive rights is not a license, it’s part of a contract that the author has with the publisher, where the author agrees to give up rights (be excluded) in exchange for something else. It has no bearing on users of the material; there is no dual licensing here. Exclusivity comes from copyright law, not from any proclamation the publisher makes. (2) Even if this were a dual license, licenses cannot take rights away, so there is no way that any license can contradict or modify any other license. If you can do A because of license X, and you can do B because of license Y, and you can do A and B, no matter what the licenses may pretend to say about prohibitions on A and B. Prohibitions in a license can only be conditions on the exercise of rights: ‘you can do A if you do P’ (e.g. you can copy if you attribute) does not mean ‘if you do A you have to do P’ because you can perfectly well do A without P if a different license lets you. ‘You can do A only if you do P’ or ‘Joe has exclusive rights’ would be a prohibition, and a license, no matter what it claims, cannot globally prohibit anything that was not already prohibited (by copyright law).

It is possible for a contract to prohibit a party to the contract from doing something. This is why libraries are prohibited from doing things with journal articles (like text mining) that would otherwise be permitted under copyright law or a CC license.

(Also note that when you pay for access to an article, that has nothing to do with copyright. The ‘license’ granted to you in exchange for payment is for access, not the ability to copy. And a CC license does not require anyone to make the material accessible.)


Categories: Uncategorized

How can you tell whether a robot is referring?

In brief: I still don’t know yet.

I keep saying I’m going to work up to an examination of reference and objects. I’m not ready for this yet, but I wanted to put down a few thoughts.

Recall by way of motivation that on first encountering the so-called ‘semantic web’ and its dogma of ‘identification’ I felt that it didn’t belong in an engineering context without further explanation. When I expressed my discomfort to one of the principals, he challenged me to fix it.

I’ve claimed that propositions and the semantics (or pragmatics) of complete messages can be put on a foundation solid enough for scientific and engineering analysis. The question then is whether we can do something similar for reference. I put forth two explanations for this: one in terms of state spaces, and the other in terms of engineering specifications.

The state space explanation says that a proposition is a bipartition of the state space of a system (or world) into a block of states in which the proposition is true, and a block in which the proposition is false. State spaces are familiar in all kinds of systems analysis and are well within the comfort zone of engineers and mathematicians. Propositions can be related to one another, and whether an agent generates a message is itself a proposition. So this seems tidy.

The engineering-specification explanation says that a proposition is something that can be tested. We can say that a message means a proposition if, roughly speaking, the message is generated when, and only when, the proposition is true. This kind of condition is fine as a specification if we can determine when the proposition holds, and many propositions are amenable to such a test – and the ones that aren’t are ones we probably don’t or shouldn’t care about.

So if someone claims that by message M, agent A means proposition P, we have ways to test whether it does – we don’t have to take such a claim on faith, and we don’t need to introspect to get the answer. It just does or doesn’t, and this can be determined experimentally.

The problem with reference (or the meaning of a noun phrase; I’m not going to bother yet with Frege’s sense/reference distinction) is how to get a comfortable corresponding story around claims of the form: by generating a message that has message part Z, agent A is referring to X. Suppose there were to be a dispute over the claim that by Z, A refers to X. How would it be settled? Not by repeating the claim, and not by introspection or projection, I hope.

This is an especially severe question when A is an engineered artifact (what I’ve been calling a ‘robot’) that is doing the putative referring (i.e. is sending the message that has the putatively referring part Z). How can you tell whether a robot is referring?

I take as given that robots *can* refer, since I believe that humans as language-speaking agents are different only in degree, not kind, from robots. There is no secret sauce that only humans have that lets them refer.

My homework: make another assault on On the Origin of Objects by Brian Cantwell Smith.

Pointless logics

I happened on a passage in Wikipedia calling out a connection between description logic and propositional dynamic logic (PDL). Both of these formal systems are pointless; they don’t have reference in the usual form. This is certainly appealing for someone trying to eliminate and rebuild reference. For any proposition, there is an implicit subject, an ‘individual’ in the DL case and a ‘world state’ in the PDL case. One can ‘talk about’ a new subject by saying what operator to apply to get from the current subject to the new one. You don’t refer to a subject, you give a path to access a new subject from an old one.

Objects as spindles

Here is something I keep picturing. Sentences that share a common subject (phrase) mean propositions that are all about the same thing. (Modulo homonyms that is, but please let me ignore those.) So we might say, as a way to eliminate referents from the account, that an object is just what a particular collection of related propositions is about. Think of the object as corresponding to a spindle, and the propositions about it are all impaled on the same spindle. The purpose of objects might be to organize propositions.


I like inferentialism, and was interested to hear the inferentialist take on reference. Jeremy Wanderer’s book on Brandom talks about the ‘the challenge of subsentential structure’ – that nails the problem. But it then goes on to repeat Quine’s idea (in Use and its place in meaning) that substitution of one phrase for another, or coreference, is the best one can do by way of explanation. I find this to be very unsatisfying. If in a given language there were only one phrase that could refer to X, then we would have no account at all of the meaning of that phrase, which is absurd.

Categories: Uncategorized

Cite, citation

  1. The officer cited Bob for speeding.
  2. The officer issued Bob a citation.
  3. Alice cited Bob in her argument.
  4. Alice cited Bob’s paper in her argument.
  5. Alice’s paper had a citation to Bob’s paper.

Could these usages all be instances of a common pattern? I claim that they are. I cite the OED as my first witness. Here is what it gives as the first definition of cite:

To summon officially to appear in a court of law, whether as principal or witness.

The etymology given is Latin citāre to move, excite, or summon.

In example 1, Bob is the principal in a case. It’s not the officer’s job to declare guilt or mete out punishment, so he’s telling Bob to show up in court for trial and judgement on the accusation of speeding. Of course we usually default on these citations, thus implicitly pleading guilty, and pay a fine without showing up in court.

In example 2, there is a citation, that is, an act of citing: The officer is citing Bob (demanding that he show up in court). This would ordinarily mean that there is a piece of paper that records the fact that the officer has cited Bob, and that piece of paper is called a citation, and the piece of paper issues from the officer. This follows a common pattern where a proposition (claim, accusation, agreement, etc.) is confused with an expression of it (piece of writing, audio recording, etc.). I suspect there is a word for this kind of metaphorical extension, something akin to metonymy, and if there isn’t there should be.

In example 3, we’re not necessarily talking about a court of law, but a metaphorical court, that of scholarly debate. Alice is not literally asking Bob to show up as witness, but is doing so metaphorically, and if the stakes were high – if the argument escalated to a court case – that is what she would do.

In example 4, Bob has written a paper making some claim, and that claim, Alice says, supports her argument – so she wants him as a witness. Papers are much easier to summon than people, especially when e.g. the person is dead or incapacitated, so Alice cites the paper as a substitute for Bob himself. The paper is analogous to a legal deposition. In the era of the Web, we access sources very easily, by following a hyperlink, so a hyperlink can very usefully serve to satisfy a citation (with the usual provisos about bit rot, digital attacks, and so on).

In scholarly writing, as elsewhere, citation is an act of citing, or (metaphorically) a physical record of such an act. Those little parenthetical or superscript numbers you see in academic writing act as citations only if you know who (or what) is being summoned and why. The why usually comes from the preceding sentence – i.e. the sentence makes a claim and the superscript acts to cite (summon) support for the claim. The what or who comes from the footnote or endnote.

To say, as Wikipedia does today, that the little superscript is the definition of citation is ridiculous. It is just a participant in an act of citation. Even saying that a citation is a kind of reference is I think quite wrong. The expression of a citation makes a reference – to the entity being summoned. But there are many references that are not for the purpose of citation, and a citation is not a kind of reference. A reference can imply a citation just as shouting the name “Bob” can imply a request for Bob’s presence, but the reference to Bob and the request that he come are two completely different things.

As usual I’m going to say my peevish thing about the dilution of language robbing us of useful and deep means of expression, and shifting focus to the silliness of mechanics. Making claims and defending them are the heart of scholarship, and citation is a close partner. Written articles and their superscripts and lists are just mechanics and are in a sense irrelevant – if there were another way to accomplish the claiming and defending, that would be fine, and would not require us to stop using the word “citation”.

Thanks to Ross Mounce and Ed Summers for forcing me to write this down.

Categories: Uncategorized

Question-answering robot

2014-12-15 1 comment

Continuing my exploration of meaning and reference. Some of the following repeats what I’ve already written in this blog, sorry. I’m using the blog as a way to work out the exposition, which is still too long and too abstract.

TL;DR You can’t independently specify or test the meanings of sentence parts. Specifying and testing meaning always reduces to specifying and testing the meanings of complete sentences.

Propositions again

When an agent A sends a message to an agent B, one or both of the following should hold:

  • there is a correlation between what holds (what the world is like) before generating the message and A’s choice of message to generate,
  • there is a correlation between the message that B receives and what holds after interpreting the message.

Without at least one of these, nothing has been communicated.

By “correlate” I mean that the message and what holds vary together. E.g. as the color of a disk varies between green, red, and blue, the message might vary between “the disk is green”, “the disk is red”, and “the disk is blue”.

I’ll call what holds before generating a precondition (of message generation), and what holds after interpreting a postcondition (of message interpretation). When I use these words I really mean least inclusive precondition and most inclusive postcondition, since otherwise the terms are not helpful.

The precondition case covers messages that are simple declarative sentences (“it is raining”). I’ll call such messages “p-messages”. (Being a p-message is not an inherent property of a message. To classify a message as a p-message you have to know something of the sender’s and receiver’s behavior.)

We can experimentally test any candidate proposition for whether it is the precondition of some message being sent. Just vary the agent’s circumstances (i.e. state) and watch what messages the agent sends. If the given message is sent if and only if the proposition holds, then the proposition is the precondition of sending that message.

Imperative sentences (“please close the window”) can be treated in a dual manner; one might call them s-messages and say that the postcondition of interpretation is a specification. Again, the claim that a particular proposition is the postcondition can be tested.

You might ask: Well, what if the precondition of the generated message isn’t met (the system “lies”), or the postcondition of interpretation isn’t met (it “betrays” us)? How can you call something a postcondition of interpretation, when success is not guaranteed? You could say a specification isn’t met, or that a theory is wrong. But in any engineered system success is never guaranteed. Things go wrong. Perhaps some constituent part does not live up to its specification, or the system is operating outside of its specified operating zone. You could put the qualifier “unless something goes wrong” in front of everything we say about the system, but that would not be very helpful.

Questions and answers

It’s less clear what to say about interrogative sentences (“what color is the disk?”) and responses to them (“green”).

For the sake of neutrality, and to emphasize their syntactic nature, I’ll call interrogative sentences “q-messages” and their responses “a-messages”, q and a being mnemonic for “question” and “answer” respectively.

Consider a scenario in which a q-message is sent to a question-answering robot, and an a-message is sent in response. To apply the pre- and post-condition framework given above, we need to consider the postcondition of interpreting the q-message, and the precondition of generating the a-message. (I’ll only consider what it takes to specify or describe the question-answering robot, not the agent that communicates with it, to which I grant total freedom.)

What needs to be the case after the q-message is received? Well, an a-message must be sent; but not just any a-message. As with p-messages, for any a-message, a certain precondition must be met. But crucially, the precondition, and therefore the choice of a-message, depends what the q-message is. The question is, what is the precondition of sending a-message ax, given that the preceding q-message was qx? (‘x’ is for ‘syntax’)

If we’re trying to specify the behavior of the robot, we need to specify, for each q-message, what the allowable a-messages are, as a function of the agent’s current state. The robot can choose among these a-messages.

One way to do this is by brute force enumeration. For each qx, write down the function (perhaps nondeterministic) from circumstances to answers ax. The size of the specification is going to be proportional to m*n where m is the number of possible q-messages qx and n is the number of possible a-messages ax.

A better way is to exploit the structure of the robot’s world. When we ask what the color of the disk is, and what the color of the square is, we’re asking similar questions. Each color-inquiring q-message can be associated with a ‘spot’ in the world that can have its color sensed. When the q-message is received, the color state of the spot that it designates can be sensed and an appropriate a-message can be chosen.


It is natural to interpret q-messages as questions, and a-messages as answers, just as p-messages can be interpreted as propositions. This may be difficult or impossible for a particularly perverse robot, but if we are designing one ourselves, our ability interpret messages is something we can control.

The proposition corresponding to a p-message can be inferred by studying the conditions under which the p-message is sent. Things are trickier regarding interpretation of q-messages and a-messages. For a q-message, we can look at how the resulting a-message varies with aspects of the world. If we can find a variable in the world that varies along with the a-message (correlates with it), and doesn’t vary otherwise [except within spans in which a single a-message covers many values – think about this], then we can say that the question is the one that asks what the value of that variable is.

Similarly, we can interpret an a-message as an answer: it is the answer that says that the variable that the preceding question (whatever it is) asks about takes on a value that can elicit the a-message, given that the preceding q-message is interpreted to be that question.


There is a tidy way to look at questions and answers using a simple formal veneer.

Any proposition p induces a function pf from world states to {true, false}, defined so that pf yields true when p holds in that world state, and false otherwise. (To spice things up I sometimes say “what holds” or “circumstances” instead of “world state.”) Call such a function a “p-function”.

Similarly, a question q induces a “q-function” qf from world states to values, and an answer a induces an “a-function” af from values to {true, false}. qf determines the value corresponding to a world state, and af tells whether an answer a is acceptable for a given value.

Consider the proposition that q has a as an answer. Call this proposition z. Let qf be the function induced by q, af be the function induced by a, and zf be the function induced by z. Then the following holds:

  zf = af o qf


  zf(ws) = af(qf(ws))

Interpreting this, it says a question/answer pair is (like) a factorization of a proposition.

Any formalism is likely to drop some of the richness of what it models. Real propositions, questions, and answers probably have more structure to them than functions do. Whether enough structure is captured depends on how the formalism is applied. In this context we’re concerned with specification and prediction, and functions may work fine.


Specifying and testing

It makes sense to specify that a p-message MUST “mean” a particular proposition p – you are just saying that the robot must generate the p-message if and only if p. We can test to see whether the robot satisfies this condition.

Suppose we tried to specify that a q-message MUST “mean” a particular question q. A specification must be testable. How would a claim that qx means q (when the robot interprets qx) be tested? We’d have to see what a-messages were generated in response to qx – they would have to be the ones that “mean” correct answers to q. But to say this, we need to specify that a set of a-messages MUST “mean” a corresponding set of answers. Then, to test whether an a-message “means” a particular answer a, you’d have to send a bunch of q-messages, and for each one, check whether the a-message that comes back is or is not generated, depending on whether the answer a is an answer to the question that q-message “means”. But then you’d have to specify what each q-message “means”. This is circular.

This is therefore not the way to specify the behavior of a question-answering robot. What you have to do is to define a correspondence between q-messages and questions, and a second correspondence between a-messages and answers. Because we’re writing the specification we can simply do so by fiat, by way of exposition, just as in a specification for motor oil you might say ‘define v = 0.0114’ and then use ‘v’ elsewhere in the specification. Simply defining correspondences does not by itself say anything about what the robot has to do. Then, we specify that when a q-message is received, the a-message generated MUST be one with the property that the corresponding answer is an answer to the question corresponding to the q-message that was received.

An alternative, fully equivalent approach would be to specify the behavior of the robot using the formalism. You could define a correspondence between q-messages and q-functions, and between a-messages and a-functions, and say that the generated a-message MUST be one that makes the composition of the q-function and the a-function evaluate to true when applied to the world state. These correspondences give an interpretation of the q- and a-messages that is just as effective as the interpretation where they are questions and answers.

Going in the other direction, when we reverse engineer a question-answering robot, we have to come up with a theory that explains the data. The data consists of q-message/a-message pairs. As we develop our theory, the correspondences of q-messages and a-messages to meaning-like entities (question/answers or q-functions/a-functions) have to be hypothesized and tested in tandem; we cannot understand q-messages in isolation, or a-messages in isolation.

Compositional languages

Given an understanding of question answering, it is very easy to imagine, or design, a language of p-messages that have two parts, one part being a q-message and the other an a-message. (Perhaps some punctuation or other trivial change sneaks in there, but that’s not to the point.) The meaning of the p-message – i.e. the precondition (proposition) that holds when it’s generated – is that the a-message is a correct response to the q-message. The analysis works exactly as it does for question answering.

This particular compositional message formation is an instance the principle of compositionality, which holds when the meaning of a compound phrase (such as a p-message) is nontrivially determined by the meanings of its parts (in this case a q-message and a-message). I say “nontrivially” because in any language where phrases have parts you can always come up with some trivial definition of part meaning and composition – essentially a table lookup – that makes the phrase meaning the same as the composed meaning. Compositionality means that there is some compression going on, and you’re not quadratically just listing all the cases.

Example: q-message “Which way is Lassie running?” + a-message “South.” => p-message “Lassie is running south.”

See also

  • Horwich, Truth Meaning Reality, of course…
  • Yablo, Aboutness, of course…
  • Carpenter, Type-Logical Semantics
  • Wittgenstein, Tractatus Logico-Philosophicus
  • Jeffrey King, The Nature and Structure of Content
  • Gopnik and Meltzoff, Words, Thoughts, and Theories

Afterthought: When reverse engineering there are always multiple theories (or should I say ‘models’ like the logicians) that are consistent with the data; even when you account for isomorphisms. This is certainly true when the world state space is incompletely sampled, as it would be if it were continuous. But I think this holds even when everything is known about the robot’s world and behavior. It is customary, if you have your hands on multiple theories, to choose the simplest one in making predictions (Occam’s razor). (At this point I want to point you at the work of Noah Goodman…)

Afterthought: There’s a book that argues that young children are scientists developing and testing theories of the world. When I remember the name I’ll add it to the list

Categories: Uncategorized