Wednesday, May 23, 2007

Consequences of instance-based ontology evaluation

As I mentioned before, one of my main projects these days is the production of an "instance-based" program for ontology evaluation. The fundamental idea is that, when evaluated in the context of an operational system, ontological classes should represent some consistent and unique pattern of features associated with the instances in the system. For example, the proteins (instances) annotated as extracellular (class) should display some common motif (e.g. a signal peptide) or combination thereof if the the class is a good one.

So far, our program stands on the assumption that all of the data that goes into it (annotation of proteins..) is generally complete and valid. If the assignment of instances to classes or the assignment of other property values to instances is flawed, then the quality estimates for the classes in the ontology will be low. Thus, when we evaluate the quality of a particular class, we are also implicitly evaluating the quality (and the quantity) of the entire annotation system in which the class is embedded.

Is this good, bad or ugly?

Personally, I like it... and this is one reason why:

The other consequence, alluded to earlier, of this form of ontology evaluation is that it may help to expose the causal mechansims underlying associations between instances and their annotation. To illustrate, if classes in ontologies were evaluated using this method on a periodic basis, trends in performance on this test should be observed that indicate both the quality of the idea represented by the class and the reasons that instances should be assigned to the class.

In the beginning of an ontology-driven annotation project, few instances will be present and thus all classes will score poorly. Over time, more instances and more annotations of those instances will be added to the system. If the classes (concepts etc..), the assignment of instances to those classes, and the other annotations are good, then the scores should improve accordingly. Ideally, the scores would improve to the point that the patterns extracted during the evaluations could be used to automatically assign new instances to classes - a la DL reasoning or another mechanized classification system.

This pattern seems to fit the scientific method well. First instances are observed and described. For example, experiments are conducted to find out where proteins are located in cells. After many instances are observed and described, theories are proposed and tested that explain why those instances fit in those classes. For example, proteins seem to fit in the class extracellular because they contain a particular pattern of amino acids, thus any protein that displays the same pattern should, according to this principle, also fit into this class.

This all seems to suggest that perhaps we should be building applications that embed machine induction as a fundamental part of a continuous process of knowledge evolution. Oh wait.. people already did that and more !!!