Yablo Aboutness

“Aboutness” – that is, the question of whether, for given X and Y, X is about Y – is interesting in its own right, and is of interest technically, for example in understanding the foundations of web architecture and the semantic web, and of the engineering of tools such as the information artifact ontology. Stephen Yablo’s book on the subject is a delight to read. He takes quirky examples from his personal life and from literature, and he avoids unnecessary jargon. And it provides plenty of useful insight into the question.

Here is how I understand his model:

The world changes, i.e. there are many different conditions or states it might be in. Borrowing language from dynamical system theory we consider a world state space, whose points are all the potential states of the world. As time advances, the actual world traces out some path through this space.

The notions of subject matter, aboutness, and parthood can be modeled using a lattice of partitions of the world state space. Consider some object X. Ignoring all of the world other than X, X has its own states, in its own state space. X is part of the world, though, so its states are determined by the states of the world – a sort of simplification or projection. Recall that a partition of a set S is a set of nonempty sets (called ‘blocks’) such that (a) distinct blocks are disjoint and (b) the union of all the blocks is S. We can take X’s state space to be a partition of the world state space, and its states to be blocks, as follows: Two world states are in the same X-block (they are X-equivalent) iff they differ only in ways that make no difference as far as X is concerned. When X changes, the world moves from one X-block to another, and when X doesn’t change, the worlds stays in its current X-block.

To help grasp the formalism I like to think of the simple case where the world state space is R3. The world state traces out a path in R3. We may sometimes care only about one of the coordinates, say y but not x or z. y is an ‘object’ with a state space isomorphic to R, but we model it as the partition of R3 with one block of world states (points in R3) for each possible state of y. That is, each y-block is a plane parallel to the xz-plane.

The partitions of the world state space form a lattice, so we can speak of the ordering of partitions (called finer-than or coarser-than depending on which direction it’s written), and of meets and joins and all the usual lattice theoretic stuff. For every entity there is a partition, and if X is part of Y, then X’s partition is coarser than Y’s partition. (Intuitively: smaller things have smaller state spaces / bigger blocks.) So coarser-than models parthood. Coarser-then also models “inherence” of a quality in the thing that has that quality: that Fido’s weight “inheres” in Fido means that Fido’s weight’s partition is coarser than Fido’s partition. (I’m using ‘quality’ in the BFO sense, although I probably really mean ‘dependent continuant’.) Similarly, observe that any proposition (e.g. “Fido weighs 10 pounds”) partitions the world into two blocks: one consisting of states in which the proposition is true, and the other those in which it is false. When a proposition is “about” an entity, its partition is coarser than the entity’s.

I find this uniform treatment of objects, parts, qualities, and propositions to be appealing. It helps explain my discomfort with conventional ontologies like SUO. Consider the following four entities:

  1. Fido
  2. Fido’s tail
  3. Fido’s weight
  4. That Fido weighs ten pounds

The SUO top level would say that 1 and 2 are Physical, that 4 is Abstract (because it’s a Proposition), and that 3 doesn’t exist. To me they are all the same kind of thing, just some more “part-like” than others. They ought to be either all Abstract or all Physical. By Yablo’s programme they are just entities with partitions of varying fineness.

Although I’m not always a fan of BFO, it is closer to a uniform treatment. 1, 2, 3 are all continuants. BFO has no propositions (4) but it is not difficult to imagine adding them, and it is pretty clear where they would fit (they would be a particularly “atomic” kind of dependent continuant).

Categories: Uncategorized

bit interleaving

Find the pattern.

   0   2   8  10  32  34  40  42 128 130 136 138 160 162 168 170 
   1   3   9  11  33  35  41  43 129 131 137 139 161 163 169 171 
   4   6  12  14  36  38  44  46 132 134 140 142 164 166 172 174 
   5   7  13  15  37  39  45  47 133 135 141 143 165 167 173 175 
  16  18  24  26  48  50  56  58 144 146 152 154 176 178 184 186 
  17  19  25  27  49  51  57  59 145 147 153 155 177 179 185 187 
  20  22  28  30  52  54  60  62 148 150 156 158 180 182 188 190 
  21  23  29  31  53  55  61  63 149 151 157 159 181 183 189 191 
  64  66  72  74  96  98 104 106 192 194 200 202 224 226 232 234 
  65  67  73  75  97  99 105 107 193 195 201 203 225 227 233 235 
  68  70  76  78 100 102 108 110 196 198 204 206 228 230 236 238 
  69  71  77  79 101 103 109 111 197 199 205 207 229 231 237 239 
  80  82  88  90 112 114 120 122 208 210 216 218 240 242 248 250 
  81  83  89  91 113 115 121 123 209 211 217 219 241 243 249 251 
  84  86  92  94 116 118 124 126 212 214 220 222 244 246 252 254 
  85  87  93  95 117 119 125 127 213 215 221 223 245 247 253 255 

The entry for the mth row and nth column (zero-based) is given by:

(define (pair m n)
  (if (and (= m 0) (= n 0))
      0
      (+ (remainder m 2)
         (* 2 (pair n (quotient m 2))))))

Of course the isomorphism (of natural numbers with pairs of natural numbers) won’t be new to anyone (cf. Cantor); I’m just having a bit of fun with the demonstration.

Categories: Uncategorized

On the proposed data-linking 2NN status code

Eric Prud’hommeaux, apparently on behalf of Tim Berners-Lee (see here), is spearheading a drive for a puzzling and peculiar extension to the HTTP protocol (“2NN”). This is the latest move in the agonized httpRange-14 thread started by Tim twelve or so years ago.

This is all about how we are to read, write, and deploy linked data using RDF. (The issue pretends to go beyond RDF but I don’t see any consequential sense in which it does.) Remember that RDF is a language of URIs (I sometimes call it URI-ese), and there are three main kinds of URIs in RDF: those with # in them, those without # but deployed using GET/200, and those without # but deployed using GET/303. (Forget about URNs and so on for now.) It’s agreed that the use of hash and 303 URIs is unconstrained in RDF. The question is whether GET/200 behavior imposes (or should impose) constraints on interpretation of 200 URIs.

If there is debate over whether GET/200 imposes a constraint, and debate over what the constraint would be, and debate over whether the constraint is at all useful – and all three debates are real – the prudent RDF practitioner will simply stay away from GET/200 URIs altogether. And that is what many of them do.

But this is not a very happy situation, because the community seems to hate both hash URIs and 303 URIs. Hash URIs are ugly, and 303 URIs are inefficient (extra HTTP round trip to get the “data”) and hard to deploy. What’s the point of reserving 200 URIs? Why not just forget about GET/200 as a constraint, and use 200 URIs the same way hash and 303 URIs are used?

The W3C Technical Architecture Group, of which Tim is a member, spent many expensive hours of discussion time on this issue over many years. I personally made many attempts to formulate and avocate for what I thought was a reasonable position, and failed. (This is not to say it’s impossible to do, just that I didn’t.) In the end a subcommittee (after discussion with Tim) settled on a version of a proposal by Sandro Hawke, which Jeni Tennison rendered as URIs in Data. The new memo met with general approval (or perhaps resignation) within the TAG and received very little comment outside the TAG. The memo makes peace between the two sides in the debate by saying it’s all a matter of interpretation, and that you can shift between vocabulary interpretations without a change in overall meaning. It’s only the overall meaning that matters. You can use 200 URIs in RDF pretty much the way you use hash and 303 URIs, and here are some suggestions on how to express the way in which your use of them relates to GET/200. No awkward deployment, no extra round trip, no conflict with any prior GET/200 constraint. Problem solved.

I thought the matter was done and the RDF community was then going to do whatever it wanted to, with or without the solution offered by the memo. Then Tim writes that no, the memo’s proposal to use GET/200 to avoid the extra round trip is no good after all. To avoid the extra round trip we need a new HTTP status code that blends a GET/303 with the subsequent GET/200 for the data, but is not 200. It’s still hard to deploy, so not as good as the GET/200 solution from Jeni’s memo in that sense, but at least it fixes the extra round trip problem.

In other words, Time has decided that the memo’s proposal does not work. It would be interesting to know why. My guess is that it is too subtle.

There is no rational way to end this argument. It hinges on the question of finding meaning in GET/200 exchanges. We know that in the wild, GET/200 means nothing at all, at least on the web, other than that some kind of request for some kind of information was satisfied in some way by the provision of some information (as opposed to, say, GET/404, which does not mean that). There is no evidence that URIs “mean” anything on their own to the community of people deploying and using web servers over HTTP. – I’m talking about as a general convention. GET requests for particular URIs certainly do have meaning in many cases to those who make and satisfy them, but the meaning is bespoke to each URI. To find general constraints, you have to look elsewhere.

One place to look would be practice “in the wild” in the RDF community. I have not done a careful study – that would take more time than I have – but my impression is that 200 URIs are used in two ways: as hyperlinks, with their “meaning” or constraint specific to the particular link type (e.g. rdf:seeAlso, owl:imports); and, by certain parties, the same as hash and 303 URIs, i.e. unconstrained, with ‘meaning’ suggested by RDF ‘data’ retrieved from somewhere. So – with regard to Eric’s proposal, we strike out again, since we find no general rule from this source.

A third place to look would be specifications that have some standing in the community. One is the HTTP specification: it says GET/200 means that “an entity corresponding to the requested resource is sent in the response”; an entity is “the information transferred as the payload of a request”…  The requested resource is the one identified by the URI, a resource is anything at all, what a URI identifies is up for grabs. This is a rathole with a dead end. I’ve read this stuff dozens of times and I can only conclude that it is vacuous. There is no use of the HTTP protocol that can possibly fail to conform with all this resource-information-entity-identifies business.

One could look at well-known advisory documents such as Architecture of the World Wide Web and REST. These say that if the server infrastructure is built according to certain principles, then the designers of its various parts will have ‘identified’ one ‘resource’ for each URI. That is, for each URI, there will be something, whose properties are known to someone involved in the design, ‘identifies’, and furthermore, that there is some (unexplained) relationship between the state of that thing and its ‘representations’ (what you get when you GET). But: how would you know whether a site is designed this way? And even if it were, how would you learn any of the properties of the thing the designers ‘identified’ by any URI? Who would you ask, and how, and in what terms would they reply? – So this is not such a useful constraint, in my opinion. It may be that the site is designed beautifully, with nice intuitive URI ‘identification’, but that there is no way a client can, in general, know anything about what is ‘identified’, much less actually know what is ‘identified’.

In any case this doesn’t match how GET/200 URIs are used in RDF. Usually you do some GET requests and look at what comes back, and then do some reverse engineering or apply intuition to guess the properties of whatever responses might come back in the future. You then impute those properties to what the URI refers to in some bit of RDF. This has little to do with what sort of software architecture is employed by those providing the web site – it is about the behavior of the site.

(Tabulator does its own brand of reverse engineering: it assumes that the identified ‘resource’ is the information you get from GET/200 – or at least something with the same properties. This is a useful heuristic and works for static documents, but is unsound in general.)

OK. So some people A believe that some other people B depend on a constraint around GET/200, i.e. that if a 200 URI is used that will  have certain implication for B. But there is no general agreement on what that constraint is, so it can’t be exploited by someone writing a 200 URI, e.g. A. The prudent RDF programmer will therefore program defensively, and just avoid generating RDF containing 200 URIs altogether. Similar someone reading RDF won’t read anything into a 200 URI. If these careful people hate the hash syntax, and can’t tolerate the extra 303 round trip, then I guess 2NN is for them. That’s a lot of ‘ifs’. It all seems a bit silly to me. Good luck getting the IETF reviewers to comprehend all this.

I think that if I ever have to write code that makes use of content deployed using GET/2NN, it will probably treat GET/200 and GET/2NN the same way, and not look too hard at the status code. After all nobody has told me what the GET/200 constraint is, and there may be useful data in the response. That way I’ll also be able to use content deployed using GET/200 by people who don’t think there is any GET/200 constraint.  … if everyone thought like this, then what would be the point of the 2NN response – why not just use 200 ? But I guess not everyone thinks like this.

Appendix: How do you refer, in RDF, to documents, if not using a 200 URI? Well, you can’t reliably use a 200 URI that yields the document, because nobody will be sure which document you mean. Maybe you intend the document you got with some GET of the URI, but a different document might be served on the next GET request; maybe you meant to refer to a “changing document”, maybe you didn’t, there is no way for your interlocutor to tell. So use a hash or 303 URI with metadata – that is, a citation – to refer to the document. As part of the citation you can provide a URI saying, in the usual way, that the document was retrieved using that URI on such and such a date. If there isn’t already an RDF vocabulary (BIBO??) that lets you express a citation to a level of detail that you’re comfortable with, then one can be invented. When you give the retrieved-from URI, provide it as a string literal, not as an RDF URI reference, since what the URI reference refers to is unclear – as I say earlier in this paragraph.  If you really mean “whatever happens to be the latest document retrieved using this URL” then express what you mean, again using a suitable vocabulary. Don’t leave it to chance.

Categories: Uncategorized

Serializational State Transfer?

There is this thing called REST, which its authors describe as a software “architecture style” that is a “model of how Web applications work“. The early description of the Web, where ‘universal document identifiers’ name documents, became inaccurate pretty quickly after the Web started, since HTTP didn’t prevent anyone from making web sites where the same UDI could yield different documents at different times. This was extremely useful but led to considerable confusion over the founding theory: what a ‘document’ was and which one was ‘identified’ if you could get different ones at different times, when authenticated differently, etc. So to replace the foundational theory Fielding looked for a better description of how Web applications work, and in particular how they use URIs. It sounds as if he surveyed a number of Web applications, and found something common among them, and he called this common whatever REST.

Under REST a UDI (now called a URI) is no longer associated with a document, but rather with a ‘resource’ or ‘network resource’ or ‘object’. This isn’t really helpful in itself since it just shifts the ontological confusion from one word to another. The only thing we observe in HTTP is message payloads, so the only question to be answered in analyzing communication is, how do message payloads (particularly for GET) relate to these resource-things, and give us information about them (properties)? If there is no relationship, then we know nothing at all about resource-things.

The party line is that the payload is a ‘representation’ of the resource-thing, which is again not at all helpful since it only changes something we didn’t understand and don’t have a name for, to something we don’t understand and do have a name for. The word is evocative, to be sure, but extremely ambiguous. So what, really, was the REST insight, based on empirical study of real Web applications?

I’ve mulled this over for a long time and here is what I’ve come up with. The ‘resources’ or ‘objects’ of REST are data structures or data objects that are stewarded and controlled by computational devices, and the message payloads are serializations of the current state of those data objects. That is, if you read about REST and substitute ‘serialization’ for ‘representation’, it will make a lot more sense.

So the empirical claim, as I interpret it, is that a common pattern among Web applications in the mid 1990s is that payloads (web page instantiations) are associated with single data objects in the memory space (disk or otherwise) of the application. The object could be a database record or table, or a map or image or etc. The data object (that is, its contents, or state) can change over time, in which case you get different serializations at different times, but it’s always the same object being serialized, for each URI.

I don’t know whether this claim is true; I didn’t do a survey of the kind Fielding did. It would be nice to see evidence. My feeling, based on looking at lots of web pages, is that payloads/pages are probably assembled from bits and pieces of many data objects in the application, in addition to simply providing information that’s not stored in a data object, but is just the output of some process such as a calculator, chat, random number generator, webcam, etc. I would be surprised to learn that REST is true of even 25% of web sites or URIs out there.

In addition to the empirical claim there is advice, based on the experience of the authors, that if you create a web site using the REST style, it will be a better site than if you hadn’t. This prescriptive angle is amplified in the W3C Web Architecture recommendation. This is a different claim, and may well be true – certainly if you follow REST style you’ll get URIs that work nicely and intuitively as bookmarks and hyperlink targets, and this is both a social good and should be in the self-interest of site creators.

The thing that I and other people have found confusing is the use of REST language in the HTTP specification. REST is a statement about how the organization of an application relates URIs, message payloads, and its internal data objects, and HTTP does not govern that kind of thing, it only governs the payloads themselves. A web application can be very non-REST and still conform to HTTP. So the spec’s REST language has no normative force. Instead REST has been used as a guide to the design of the HTTP specs; applications that are REST are, by design, going to be a more natural fit to HTTP than other applications (such as streaming video).

But the statement in HTTP that the payload of a 200 response to a GET is a representation of the identified resource, is not true in general, it is just wishful thinking, a vain hope that every web application will use REST style. But it’s there in the spec, so it sounds like it has normative force. The only way I can keep my head from exploding when reading the spec is to tell myself to relax and read something else instead.

Then there’s the awful situation with URIs in RDF. In RDF you want to coordinate agreements over what the properties of the things ‘identified’ by URIs are, so that you can use those URIs in communication. To me it’s clear that if the URI is associated with a REST data object, the REST style advice would be that the URI ‘identifies’ the data object whose serializations we see when we do GETs, so the properties of the thing identified (in RDF) by the URI are the properties of the data object inside the application (field values, size in bits, data type, last modified date/time, current serialization(s), that kind of thing). But this doesn’t help coordinate understanding of the data object by others, since only people with knowledge about how the application operates internally can know any properties these data objects; there’s no standard channel through which this knowledge can be communicated, and nobody would be interested in saying anything on it, if there were. Of course an outside observer doesn’t even know whether it’s a REST URI at all. And if the URI is not associated with a REST data object, anyone can say anything about what it ‘identifies’ – there may sometimes be agreement, or understanding, by luck, but there is no standard or widely used best practice to help out (other than the ‘hash URI’ and ‘303’ cases, where there is coordination, local to the RDF world). Most of the time the ‘I’ in ‘URI’ is mere wishful thinking, not a matter of fact as a naive reading of the acronym would suggest.

To me the REST theory feels like a post hoc attempt to justify calling those strings that start ‘http://…’ ‘identifiers’. In order to be identifiers they have to identify something. There would be no reason to say what the identified thing was, or even that there was such a thing, if all we cared about was a communication and a smoothly working Web; the ‘identified thing’ would have no reason to exist at all. But if you are committed to seeing ‘http://…’ things as identifiers, you are forced to come up with a story about what they identify.

(RDF is a different story, since it works in the other direction: we want to express a statement that X has property Y, so what name should we use for X? Hey, maybe we can use a URI.)

I could go on and on… I haven’t even touched on the context dependence of ‘identification': it only makes sense within some particular language that lets you express claims or commands that have agreed consequences…

Categories: Uncategorized Tags:

Error in Apache license boilerplate

It says:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

As a putative statement of fact, the second part is false. Noncompliant use is sometimes ruled out by copyright law, but is only categorically ruled out if you have done something, like sign a contract, to relinquish your rights. Using or copying a file does not in itself constitute signing a contract. There are ways you can legally use the file that are not in compliance with the license, such as:

  1. If an exception to copyright restrictions applies, such as fair use
  2. If the period of copyright protection has ended
  3. If you obtain a different license from the rightsholder that grants rights beyond what the Apache License grants (dual licensing)

A license is not a contract. It can only grant an exception to a background prohibition (think hunting license, drivers license); it can’t have the effect of establishing a prohibition that wasn’t there already. Where there is no prohibition, as in the case of fair use, a license has nothing to say. Unconditional “may not” language is not appropriate in contexts like the above.

Granted that this sentence is not in the license text itself, but rather in the boilerplate. But it is misleading nonetheless.

Frequently I see people confusing license and contract. The confusion is natural and I didn’t get it until I worked at Creative Commons. One source of confusion is that the two are often linked. When entering into a contract, you might agree to do something like pay money or give up rights, in exchange for which you might be granted a license. Libraries, for example, sometimes give up rights like text mining (which is not restricted by copyright law) in exchange for access to journals. But the relinquishing of rights is a term of the contract, not the license. And you’re only bound by a contract (the contract only exists) if you agree to it.

There is another problem with the Apache statement, which is that copyright law only restricts copying (performance, translation, etc.), not all “use”. It doesn’t make sense to license use when use isn’t prohibited.

IANAL TINLA.

Categories: Uncategorized

Aboutness, objects, propositions

Another brain dump, continuing my ongoing effort to (a) make sense of Brian Cantwell Smith (b) be of some use to OBO Foundry, which in my opinion is in a crisis regarding information
(see IAO_0000136). Don’t take this too seriously – I’m a dilettante when it comes to philosophy. Take this as a homework exercise for someone who’s trying to figure it out. Please provide corrections. [edited 2014-07-04 to change 'OBO' to 'OBO Foundry']

Alan Ruttenberg once gave me the following definition of ‘about': a sentence (or other utterance) X is about an object Y iff a [syntactic] part of X refers to Y.

At a gut level I don’t like this at all, but the following is the best alternative I’ve come with:

A proposition X is about an object Y if the truth of X depends on the state of Y.

This seems better because it is semantic instead of syntactic. It doesn’t depend on how the proposition is expressed / coded, or on any understanding of reference, which is almost as mysterious as aboutness.

My alternative relies on an understanding of ‘depends on’. To nail this you have to rule out any changes to the truth of X caused by factors other than changes to the state of Y. [Added: That's badly said, what I mean is that to prove the change to Y is responsible for the change in the truth of X, one would want to come up with a situation where there's nothing else to attribute it to. See next sentence.] That would lead to the following: The truth of X depends on the state of Y, if there are two possible world states w1 and w2 such that w1 and w2 differ only in the state of Y, but in which X has opposite truth values.

The above is independent of your choice of modal logic and world states, which could just be temporal (BFO is effectively a temporal theory).

(Maybe there are other ways than this to depend, but I don’t want to get distracted by causation.)

Both definitions of ‘about’ depend on ‘object’ (which I take to be akin to BFO ‘continuant’). I take an object to be a part of the world, so a state of an object is part of a state of the world, and the state space of the world (the space of possible world states) is some kind of product of the object’s state space and the state space of states of everything that’s not that object.

And all this relies on some understanding of the integrity or identity or continuity of an object, such that if you pick out an object in world state w1 and then try to pick out the same object in world state w2, you’ll have some way to decide whether you’ve done so correctly.

I have been reluctant to grant legitimacy to ‘objects’ (or ‘continuants’) – I’ve been wondering whether they are primarily syntactic or logical or social constructs, as opposed to something with some objective clout. Maybe the question here is: if you have two candidate identity criteria for an object that coincide in some world states but not in others, is there some principled way to choose between them? Maybe: the parts of an object are more closely coupled to one another, both spatially and temporally, than they are to parts of things that aren’t that object. This is a bit mushy but seems to have potential.

In this formulation entities (such as mathematical ones) aren’t objects unless they can be said to have variable state. Does the state of the number 7 change through time? Is 7 an object? Hard to say, but I think it would be a pretty unnatural world view that would say it does / is. (But you can have a book about the number pi… hmm… maybe this particular ‘about’ is a term of art.)

There is also the question of what a proposition is, but I don’t see that as hard; a proposition is a 0-ary predicate, which in nonmodal logic is true or false depending on your choice of model / interpretation, and in modal logic is true or false depending on which world state you’re in. I.e. a proposition is a predicate over, or set of, world states, like an ‘event’ in probability theory.

How propositions fit into BFO, I’m not sure. In some ways they resemble universals, while in others they resemble particulars (maybe they’re so-called ‘qualities’ of the world).

[Added: I admit the above account only handles particular kinds of propositions. To be complete I ought to provide 'aboutness' accounts of propositions about the past and future, whose truth doesn't vary over time; and of universal and conditional propositions, such as "where there's smoke there's fire".]

Categories: Uncategorized

thank you WordPress

Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.