Thursday, January 17, 2008

Extreme Writing for ISMB

Over the last 48 hours or so I participated in a Google-Doc enabled Extreme Writing session as the lead (English-speaking) author of an article submitted to ISMB. The article describes a new area of the Freebase knowledge space meant, in addition to quite a number of other things, to capture knowledge about the interconnectivity of different biological databases. (See the temporary development version in the Sandbox or the eventually more permanent URI). The idea for this project, which has evolved a bit from the description in the Wiki available today, and the actual work of building this resource came almost entirely from Francois Belleau of the Bio2RDF project - he invited me to help with the writing and to add my two cents about knowledge gardening on the Web, a subject I am supposed to be becoming an expert in somehow.
Here are a couple snippets from the paper:
"... a grander vision of integrative bioinformatics. In this vision, researchers or their computational agents not only discover and access all the databases that they require, but also clearly understand how each entity in each database relates to the entities in all the others. The information required to realize this vision can be conceptualized as a map that, rather than describing the interconnectivity of points in physical space, describes the interconnectivity of biological entities in the context of the Web..."
"...Right now, this meta-database makes it possible to answer queries about the connectivity of the various data sources but does not yet enable queries of the connectivity of their individual components. Before it is possible to 'zoom in' in on the lowest level entities in the global map of bioinformatics data, it is first necessary to establish a consistent strategy for their identification. The bioinformatics meta-database presented here provides a centralized, community-governed repository of public namespaces that we propose might serve as the foundation of a global unique identifier system for bioinformatics on the Web. This identifier system is based on principles outlined in the Banff Manifesto (BM) instigated at the 2007 World Wide Web conference and presented here for the first time..."
It was an exciting, intriguing, sometimes frustrating experience simultaneously co-authoring this document with 5 other people (Francois, Michel Dumontier, Mark Wilkinson, Marc-Alexandre, Jamie Taylor). Overall, Google docs handled the experience surprisingly well. A quick look indicates that it captured 5925 revisions over the past three days. My only major complaint with it was that the screen tended to jump all over the place when lots of people working at the same time. I guess this may be one mechanism they use to keep people from editing in the same text area before its had a chance to sync everyone up for that area (otherwise it wouldn't know what to display). To avoid this problem to some extent and to make the work a bit more efficient we evolved strategies for social locking of different sections of the document. It would be great if support for this could be added to G-Doc - for example, a user could select a segment of text and request a lock on it that was enforced by the Google system.

In the end, the paper really could have been much a better with a some more time to put it together. That being said, it does have some interesting ideas in it and so might just squeak into the conference. If it does, Francois and co. will certainly produce a very exciting presentation by the time the conference finally comes to pass. If not, we'll try try again.