The full text of my dissertation, "Strategies for amassing, characterizing, and applying third-party metadata in bioinformatics", is now available via UBC's information repository. It is "manuscript-based" so each of the chapters except the introduction and the conclusion can be read and understood independently. (So there is really no reason for anyone ever to try to read the whole thing in its entirety.)
Bioinformatics resources on the Web are proliferating rapidly. For biomedical researchers, the vital data they contain is often difficult to locate and to integrate. The semantic Web initiative is an emerging collection of standards for sharing and integrating distributed information resources via the World Wide Web. In particular, these standards define languages for the provision of the metadata that facilitates both discovery and integration of distributed resources. This metadata takes the form of ontologies used to annotate information resources on the Web. Bioinformatics researchers are now considering how to apply these standards to enable a new generation of applications that will provide more effective ways to make use of increasingly diverse and distributed biological information. While the basic standards appear ready, the path to achieving the potential they entail is muddy. How are we to create all of the needed ontologies? How are we to use them to annotate increasingly large bodies of information? How are we to judge the quality of these ontologies and these proliferating annotations? As new metadata generating systems emerge on the Web, how are we to compare these to previous systems? The research conducted for this dissertation seeks new answers to these questions. Specifically, it investigates strategies for amassing, characterizing, and applying metadata (the substance of the semantic Web) in the context of bioinformatics. The strategies for amassing metadata orient around the design of systems that motivate and guide the actions of many individual, third-party contributors in the formation of collective metadata resources. The strategies for characterizing metadata focus on the derivation of fully automated protocols for evaluating and comparing ontologies and related metadata structures. New applications demonstrate how distributed information sources can be dynamically integrated to facilitate both information visualization and analysis. Brought together, these different lines of research converge towards the genesis of systems that will allow the biomedical research community to both create and maintain a semantic Web for the life sciences and to make use of the new capabilities for knowledge sharing and discovery that it will enable.About the title.. I ended up using the generic "metadata" rather than something more specific because I needed a way to concisely capture things that range from Del.icio.us tags to classes in the Foundational Model of Anatomy. "Metadata" seemed to do the job, but it remains a little vague and therefore unsatisfying. Similarly, "third-party metadata" is broader than necessary. I don't touch institutionally generated third-party metadata - just what I guess you would probably call "socially generated metadata".