(Update June 8, 2012. This paper was not accepted at first submission (see sad story), but can be accessed as Chapter 4 in my dissertation. I've had enough interest in the concepts it contains that it is probably worth resubmitting it somewhere, someday, somehow...)
As noted in previous posts labeled with the tag OntoLoki, I've been working off and on for a few years now (yikes) on a program for automatic ontology evaluation. Now, we are getting ready to submit our first paper on the subject and would like to open things up for comments. I labeled it as a technical report in hopes of starting a tradition of such things in our laboratory. It seems like a good way to keep the locals on track and have another chance for reviews before things go out into the scary world of official peer review. I suppose I could drop this into Nature Preceedings again, but I'm tempted to wait until its gone through more revision cycles before I do so as that is likely to form a more permanent record than I am really ready to commit to I think.
Objective: The aim of this study is to identify, implement, and test a new method for automatic, data-driven ontology evaluation that is suitable for the evaluation of ontologies with no formally defined restrictions on class membership. The method should quantify the consistency of instance classification within such an ontology based on patterns of properties found to be associated with the instances of particular classes.
Design: We constructed a program that takes as its input an OWL/RDF knowledge base containing an ontology, instances associated with each of the classes in the ontology, and properties of those instances. For each class, it outputs: 1) a rule for determining class membership based on the properties of the instances and 2) a quantitative score for the class that reflects the ability of the identified rule to correctly predict class membership for the instances in the knowledge base. To test the proposed method, we constructed a series of knowledge bases that varied from perfectly consistent through to completely random and evaluated each one using the implementation. In addition to this artificial control study, two other well-known biological ontologies were evaluated using public data to provide indications of the behavior of the system in realistic contexts.
Results: In the first experiment, the method produced direct quantitative assessments of the different versions of the knowledge bases that correlated directly with the known level of consistency of instance assignment for each knowledge base. The evaluations of the other ontologies indicated that the method was successful at detecting relevant patterns associated with the instances of classes in real biological ontologies based on publicly available data.
Conclusion: The results indicate that the suggested method can be used to conduct objective, automatic, data-driven evaluations of biological ontologies without formal class definitions in regards to the property-based consistency of instance-assignment. This inductive method complements existing, purely deductive approaches to automatic consistency checking, offering not just the potential to help in the ontology engineering process but also in the knowledge discovery process.