Archive

Archive for September, 2015

Why is Open Tree not publishing RDF?

Question raised on an Open Tree discussion group:

I’m wondering why you are not using RDF as the underlying graph data model and OWL annotations (and other existing ontologies) to create a semantic graph and therefore following the current best practices to build knowledge graphs.

Good question. Partly it’s that only one person on the project knows anything about RDF. But I think this is mainly a matter of cognitive space and time among the developers, and priorities. If we felt a need to do it given the goals that we have, we would probably do it. But we haven’t felt any need.

Converting to RDF and OWL is easy to do poorly (and perhaps adequately for many purposes). One of the first things I did on the project was to convert the taxonomy to turtle so I could load it into a triple store. (I was on the RDF bandwagon for many years.) Anyone could do this; it’s a trivial script. Also, the NeXML format that we use subsumes RDFa Core so can be converted easily – in a sense we *do* publish RDF for the study database.

Doing RDF/OWL well is much harder, and would require cooperation with other groups such as OBO (IAO, VTO, …) and TDWG, choice of and support for persistent URLs, good term definitions and documentation, SPARQL endpoint, and so on. These coordination activities are extremely time consuming. Of course doing so would be lovely in the abstract, but there has been no reason for us to make this a priority.

In my experience, format conversion is by far the easiest activity in data ecology, so mere conversion to RDF has little value. The hard parts are marshalling the data in the first place, and then using it wisely. Due to the vagueness of most vocabulary term definitions, the best laid RDF usually requires as much reverse engineering and postprocessing as data in any other format when doing data integration and analysis. So it is semantics, not syntax, where the effort is best spent. (RDF being a syntactic play, and not helping with semantics any better than any other data format, in spite of the buzzword “semantic web”. OWL helps semantics a little but only with inference, not with ground truth, which is what really matters.)

The feedback captured in the feedback system (in github) has a little structure, and we could probably do better in obtaining more.

The thing that would tip the balance would be a real funded collaboration with another project where there was good reason to use RDF or OWL for communication between the collaborators. Publishing RDF/OWL merely for the sake of doing so is not in my opinion the best use of resources – especially given that all the information is open and anyone else could do such a conversion for us. I read a lot about the size of the linked data cloud, but very little about its utility. I bet there are legitimate uses of RDF-published data, but from what I’ve seen people mostly publish RDF just so that they can say that they did, not because they know that someone needs it. (Would love to be shown otherwise.)

How would having RDF for open tree make a difference to you, personally?

Categories: Uncategorized