Home > Uncategorized > FRBR and the Web

FRBR and the Web

I’m going to assume some familiarity with FRBR, so if you want to read the below and don’t know FRBR, you might want to consult the FRBR specification.

The idea of FRBR is that when organizing bibliographic information into database records it’s useful to group the kinds of things you would typically say into four groups.  If you have a physical book, you might talk about (1) attributes specific to that physical copy, such where it is and what condition it’s in; (2) attributes shared by other copies in its print run or edition; (3) attributes shared with other editions such as title, date, and author; and (4) attributes shared with translations and adaptations, such as intended audience or form (e.g. novel vs. poem).  If you organize your database this way you will have four different bibliographic records applicable to the book.  Overall the system’s records will be arranged in a forest, so that for each more-specific record there is exactly one less-specific record.

Now the description of FRBR says that each record “is about” or “describes” some “entity”, an ontologically dubious proposition.  I would say FRBR entities are theoretical inventions, abstractions, or fictions, depending on my mood, and from an ontological point of view it’s not clear to me that saying the records are about things other than physical items is the most helpful way to think about the enterprise. But it’s not particularly harmful either.

For someone familiar with description logic (DL) a different framework presents itself based on these “functional requirements”.  You could say that your domain of discourse consists of FRBR Items, physical carriers of information.  If you state a set of properties (such as a number of properties at a single FRBR level) you have defined a class of Items having those properties, so any of the four three other kinds of record is in effect a class definition.  An “embodies” or “realizes” relation between FRBR “entities” is really a subclass relation between classes of Items.

However the two formulations are interdefinable.  Forgive me if I switch back and forth.

This brings me to the Web.  If I write an article in my word processor, and save it to disk, there are two Items, the copy in main memory and the copy on the disk.  Put it on a Web server and many more Items (copies) can be made, but (in FRBR terms) they all “exemplify” a single Manifestation.  In DL terms you’d say they share Manifestation-level properties, and all things that have those properties form the class corresponding to the Manifestation.  In the HTTP protocol many copies of the source Item are made as the information passes through network buffers, proxies, caches, and application libraries, but they all still exemplify the same Manifestation.  So it seems safe to say that the any single GET/200 exchange is associated with a single Manifestation, at least when the response has any resemblance to something you’d be concerned with in a “bibliographic record”.

Now if you do multiple HTTP GETs of the same URI, you may always see the same Manifestation, in which case you might say there’s a special relationship between that URI and the Manifestation.  If on the other hand, there is variation between the Manifestations you get but they all embody a single Expression (for example, you get the same words encoded differently – say text/plain and text/html), then there would seem to be a special relationship between that URI and the Expression.  Similarly for Work – content negotiation could get you Expressions in different languages.

Whether there is response variation, and how widely it extends, is impossible to determine using HTTP GET alone (even with the help of Vary: headers), so the truth value of these ‘cozy’ relationships is unknowable for Popperesque reasons.  I may issue a thousand requests and always get the same Manifestion, but get a different one on request number 1001.  So if I say that a URI is cozy with a Manifestation, I had better either have inside knowledge of how that web server is configured (e.g. I could be the one running it), or else I need to be prepared to be proven wrong.

For some URIs there is no such cozy relationship of the URI to any FRBR entity.  There is always the aggregate collection of all the received Manifestations (or Expressions or Works), but the relationship between the Manifestations you get and the aggregate is different, part-of rather than embodies (etc.).  For example, different blog posts can come from the same URI at different times, but they are parts of the blog, not embodiments (realizations, etc.) of it.  FRBR would describe the blog posts and the blog very differently.

The Manifestation “cloud” around some URIs seems to fall outside of FRBR entirely.  Consider a page inside a bank’s web site for current account balances.  From a single URI, different users get different pages depending on their session authentication.  This is not really a serial publication, as pages might be issued simultaneously, and it is not really a collection since the pages aren’t collected together in one place.  So the URI does not seem to be ‘cozy’ with any FRBR entity.  Hoping that FRBR will be helpful in understanding all Web URIs is probably too much to ask.

We can connect some of this to web architecture rhetoric.   The webarch theory holds that there is something that the URI “identifies” and that it is an “information resource” that has “representations”.  It seems consistent to say that a “representation” is close to what FRBR calls a Manifestation, and that FRBR Manifestations, Expressions, and Works are all can all be “information resources” since any of the three can cozy up to a URI.  (Compare TimBL’s “Generic Resources” note.)  Undoubtedly there are other “information resources” as well but they may either correspond to FRBR aggregates swapping “exemplifies” and so on for “part of” in the way “representation of” works, or they may be undescribable using FRBR.

[2/13: Alan R points out that according to FRBR not all Expressions have Manifestations.]

In the “semantic web” world, where a URI is used not just with HTTP but as a name that refers to something, one could use a Web URI to refer to the Manifestation you get, its Expression, or the Expression’s Work, depending on the range of Manifestations that you get from GET/200 (and given the limitations described above).  This is where the DL Item-class approach wins, because it lets you ascribe properties to “information resources” without having to commit to any particularly level of the FRBR hierarchy, and thus without having to be conscious of the FRBR record modularity.  You can write down an Expression-level property, and then if you later find it’s better to use the URI to refer to a Manifestation that’s not a problem.  Of course the other direction doesn’t work so well. (I’m pulling a fast one here – I need to write another blog post on generic individuals.)

For a more sophisticated picture of how the Web relates to documents, see Henry Thompson’s article on URIs.

I’m surely not the first to think about all this, but I got tired of research after two Google searches.

Thanks to Allen Renear for prompting these thoughts and being interested, Tim Danford for “cryptic class”, and Alan Ruttenberg for null hypotheses.

Advertisements
Categories: Uncategorized
  1. 2011-02-13 at 11:00

    This is an interesting writing on a currently hot topic (at least from my point of view). I intensively engaged in this topic for two or three month now, since I started deeply wondering about the vocabulary and the definitions of the terms I used (at lest for myself) for describing the resource description levels and their relations. I tried to express my thoughts about that topic in a blog post, which you can find at http://infoserviceonto.smiy.org/2010/11/25/on-resources-information-resources-and-documents/ . This article include many reference to related work.
    However, recently I stumbled about the relationship of resource identifier and resource name (cf. http://chatlogs.planetrdf.com/swig/2011-02-04.html#T12-02-50):

    “What would you consider as a name of a document? For example, can http://example.com/text.html be the name of a HTML document, or would you prefer to use the title of this document as its name? You can ask yourself also the same question with a book as resource.
    I guess it is even easier when thinking of a name of a person? For example, http://foaf.me/holger#me can be used as a resource identifier for a person whose name is Holger, or?
    So, resource identifiers aren’t always equal to resource names. In the HTML document example from above, one might can argue. However I guess, in the case of a person or a book it might be a bit clearer, or?”

    What do you think about this issue? The resource identifier is in a strong relationship with its related information resource. However, the resource that is the subject of that information resource isn’t always clearly identifiable/describable, or?

    Regarding FRBR terms an its relation the resource description levels, I would suggest the following mapping (based at least on my personal definitions of the terms, which hopefully have some similarities with yours and these on of other people that also defined them 😉 ):

    frbr:Work ~ information resource (abstract description)
    frbr:Expression ~ concrete description (e.g. a semantic graph)
    frbr:Manifestation ~ representation
    frbr:Item ~ a single copy of a representation

    Would you agree with that?

    PS: The mappings maybe still a bit vague, I wouldn’t really tend to progate them. Although, one can draw some interesting parallels. However, it might be quite difficult to fit the concept of resource to that hierarchy, or? I currently wouldn’t prefer to put it on the frbr:Work level 😉

    PPS: I will add your article also the reference list in the article mentioned above 😉

    • 2011-02-13 at 14:51

      One of the tragedies of web architecture is the failure to get community consensus on use of its core terminology. “Resource” and “representation” are each used in at least four distinct ways. Attempts to influence how others use the words seem to be doomed to failure. Since my goal was to connect FRBR with TimBL’s take on webarch, especially the Generic Resources note, I used the words specifically the way he does, as best I can figure out how. So you have to ignore the Fielding and AWWW usages (for example) to make sense of what I said.

      As for what is a name of a document – well, you choose a name for something so that it (the name) will be understood by whoever needs to understand it. If it works, it’s a name for the thing, and if it doesn’t, it isn’t. If there are competing or unclear uses of a name you’re considering using, you either steer clear of it or steer your readers in the right direction through additional instruction or persuasion. In any case this is a very different question from the one I wanted to address here, which is how operational GET behavior and Generic Resources relate to FRBR. The notational engineering question of how to refer to these entities or others – that’s completely different.

      I think it is clear from my note I don’t think your Work mapping holds, at least not in the Generic Resources view, as different “information resources” correspond to classes or FRBR entities at differing levels of the ladder. As for Expression, it is the bibliographic records that are (partial) descriptions, not the entities themselves, and that is true at all levels. If a FRBR entity happens to be a description that’s either coincidence or an unusual use of the word “description”.

      • 2011-02-13 at 18:59

        Okay, at least the information resource must be on the most abstract level of resource description (btw, I described the relation of information resource (how it is at least defined by me) and timbl’s Generic Resource also in my referenced blog post [1]). From my understanding my definition of information resource also aligns quite well with that one of Harry Halpin as he explain it in his PhD thesis. I more less also agree that this term is often used in practice for referring to both abstract and concrete description. However, to be honest, it can really only be the abstract description, because everything is derived from it. Everything below that abstract description level is an exemplification/instantiation of things on that level. They are concrete descriptions e.g., a semantic graph or a natural language expression.

        Generally, I guess we may all feel more comfortable when utilizing definitions of the affected terms as they are already specified by philosophers (okay, we may have to substitute ‘information resource’ with ‘abstract description’). It is not really part of the problems that should be solved in Web Science community. We should accept that common understanding of these terms and don’t try to “reinvent the world”.

        As I stated already in my last comment. The resource itself should be on top of that hierarchy. Seeing the world through a Representationalism view, one would only deal all the time with “representations” of the things; that means, how some perceives an entity. However, I don’t feel quite comfortable with a mapping frbr:Work ~ resource. I remember a discussion about the issue introduced by Ian Davis (AFAIK). An issue, which is maybe somehow difficult to imagine, is the belief of “everything is an information resource”, i.e. I am an exemplification of myself – a self description/exemplification (cf. [1]; “everything can have an information resource” is, on the other side, hopefully quite easy to grasp). I think, that this view makes sense when following Representationalism. However, I’m not really a philosopher who may can better judge about this statement.

        PS: From my understanding of Roy T. Fielding work, I have the feeling that he somehow has forgotten/suppressed the description level. I would often substitute ‘resource’ with ‘information resource’ in his PhD thesis. The most important part where his argumentation defining ‘resource’ breaks (from my point of view), is that one, where he stated at the beginning that “any concept that might be the target of an author’s hypertext reference must fit within the definition of resource” (persons etc., I fully agree with that statement) and then switched to his technical definition of resource as “temporally varying membership function Mr(t)” (that fits maybe better to information resource*). For example, it’s hard to imagine that someone is a “ temporally varying membership function”. Although, “target of reference” can be interpreted as the “subject of description”. Hence, this subject is the resource and what we at least can get when dereferencing an resource URI that is used to name that resource, is a realization of an information resource that describes this resource.

        *) One can maybe also derive ‘information resource’ from the statement “the key abstraction of information in REST is a resource”

        [1] http://infoserviceonto.smiy.org/2010/11/25/on-resources-information-resources-and-documents/

  2. 2011-02-21 at 21:28

    Yes, some ‘information resources’ might correspond to some Works, but not all of them do. I think I would sooner say IR ~= {Work union Expression union Manifestation}. (Some of these would be aggregates, either temporal or not.) But there are probably many ways to model the correspondence.

    I think there are plenty of information resources that are not descriptions – e.g. symphonies – so I get sort of confused reading your comment. In this context the descriptions of interest are the bibliographic records, not the entities they describe.

    If a philosopher thinks that all documents or similar information-things are descriptions then I will probably be confused by that philosopher.

    Tim B-L and Roy disagreed deeply in the 2000-2005 period on how to talk about the web, and I have to admit Tim B-L’s version makes much more sense to me. His version is also much more compatible with FRBR, in part because it’s document-centric and therefore nicely avoids making use/mention confusions. There are things. Some things are IRs, some aren’t. Documents are IRs, and other IRs are similar to documents. Some IRs describe things, some don’t (this is where we differ). Some things are described, some aren’t. Bibliographic records are IRs (documents actually) that happen to describe IRs. Maybe some IRs describe non-IRs, but those would be out of scope in a discussion of FRBR. That’s about it.

  3. 2011-02-22 at 08:34

    “I think there are plenty of information resources that are not descriptions – e.g. symphonies – so I get sort of confused reading your comment.”

    Well, that is maybe the crux, where I don’t have mutual exclusions in my model[1], i.e. a resource can be an information resource and so on. Please also consider that I tried to outline my definition of information resource as primarily abstract description that has to be realized by at least one concrete description to be perceivable.
    I’m not sure whether the term ‘description’ really covers the whole range that I’m thinking of (I’m not a native English speaking person). However, due the many reads of Pat Hayes’, Harry Halpin’s and other philosophers’ thoughts on that issue (description/representation), I came to the conclusion that it might fit very well that part of the (/my) model[1].
    Please also notice that this (from my understanding) reflects a Representationalism view; that means, everything you perceive is a description/representation of that thing. So, everything is described for you or described (perceived) by you.

    [1] http://infoserviceonto.smiy.org/2010/11/25/on-resources-information-resources-and-documents/

    • 2011-02-23 at 13:51

      This does seem to be a problem of language, not a real dispute. Suppose that you and I met in a bar, and one of us brought a printed list of random words. If we showed the list to the bartender and then asked the bartender whether the list was a description, I bet he or she would look at us funny and say “of course not, are you batty?”. Similarly if you were to indicate a stool (one ‘perceived’ by all of us) and ask the bartender whether the stool was a description, he/she might suggest that we had had enough to drink. Generally speaking I am with the bartender on these questions. As you may have noticed from the rest of this blog I am ornery about terminology, and I prefer to use a common word like “description” in one of three ways: (1) as T. C. Mits would use it, (2) as a modest and well-articulated generalization or specialization of that sense, or (3) in a manner so outrageously distinct that no one would ever get confused. (An example of #3 would be “ring” as used in mathematics.) A definition of ‘description’ that implies that a random word list is a description does not qualify as a modest generalization of the sense that T. C. Mits would impute to the word.

  4. Erik Hetzner
    2011-03-08 at 23:08

    For what it’s worth, I always imagined that in a FRBR model of the web (or is that a web model of FRBR?) an item would be something at a particular URL (at a point in time, I suppose).

    Otherwise I am not sure how to model, e.g., a PDF file of a novel which is located at 2 urls, http://example.org/book and http://example.com/book. Clearly they are the same Work, Expression, and Manifestation (because the bytes are the same). So I thought they would be different Items.

    A system I wrote used 302 Found to redirect from Work to Expression to Manifestation to Item, prompting the user at each point if multiple manifestations were available (e.g. a PDF and HTML).

    • 2011-03-10 at 21:32

      That makes sense, I think, as long as you have reason to believe that the two URIs are backed by physically distinct hardware. If example.org and example.com were virtual hosts on the same physical server, with the URIs both mapped to the same file on disk, then there would only be one Item, not two. And if domain name example.org had multiple A records leading to a redundantly stored web site, there could be many Items accessed via a single URI.

      • Erik
        2011-03-10 at 22:08

        Why should it matter if they are backed by the same file, if the intellectual content is identical?

        To quote from the FRBR doc:

        > Where the production process involves a publisher, producer, distributor, etc.,
        > and there are changes signaled in the product that are related to publication,
        > marketing, etc. (e.g., a change in publisher, repackaging, etc.), the resulting
        > product may be considered a new manifestation.

        It’s not clear to me whether this means moving a copy of a document from one server to another constitutes another manifestation or another item.

        But I don’t see the utility of worrying about the number of items present as a web page is cached, copied around, etc. What does this gain us?

      • 2011-03-10 at 22:35

        It’s very clear from the FRBR spec that Items are physical. The number, location, and ownership of physical copies is very important if you’re concerned about preservation, disaster recovery, latency, and so on. Your consideration of caching is exactly about the number and location of several Items of one Manifestation. Libraries are certainly very concerned about their Items (a.k.a. holdings) as it’s their responsibility to keep stuff safe. So whether example.com and example.org share a disk, or have separate disks, is quite consequential.
        If you don’t care about Items – and I agree, usually you don’t talk about them – then talk about Manifestations or higher levels instead.

  5. Erik
    2011-03-17 at 04:37

    To me the essence of the “item” in FRBR has been that it manifests the same intellectual content as originally produced as any other item from the same manifestation. The physicality is incidental. Anyhow, I don’t imagine it matters too much on the Web. Thanks for the analysis.

  6. 2011-03-18 at 14:03
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: