I was playing with RDF gathered from UniProt this morning and stumbled on something that made me a little sad. Apparently molecules like the rigor mortis (rig) protein from Drosophila (depicted at right) have been isolated from "a young sporophyte contained within a seed". This seems a little strange for a fruit fly protein.
In the RDF, I see triple 1: (Protein Q86BY9, isolated from , uniprot tissue 229 (Tissue Embryo))
and triple 2: uniprot tissue 229, is the same as, plant ontology term PO:0009009 (embryo)
Even if the sacrificial fruit flies were found inside some poor young sporophyte, this probably still would not make sense. I hate bashing Uniprot because the fact that they have bothered to produce RDF versions of their records is a useful and unusual trait for a major bioinformatics institute and has enabled a lot of my work, BUT.. RDF/etc. are really much more useful when some attention is paid to the definitions associated with their constructs. OWL:sameAs means that 'two URI references actually refer to the same thing: the individuals have the same "identity"'. Having live statements that say that a fly protein was isolated from "a young sporophyte contained within a seed" is probably not such a great thing.
Perhaps we would be better off if the W3C would introduce the relationship "OWL:kindOfSimilar" . Its likely that RDF/OWL providers like UniProt use OWL:sameAs in these situations because its the built-in predicate that comes closest to expressing what they mean. Maybe if we gave them something else to use in the many cases where similarity exists but is not absolute things on the semantic Web would make more sense. (Perhaps we would also be rewarded with standardized ways to quantify the confidence or degree of such relationships as well.)
Note to Uniprot: say what you mean!
Note to semantic Web people: make it easier for them to say 'squishy' things like "sort of similar to"! Reality is analogue, deal!