Saturday, November 1, 2008

dullhunk PloS Bio Article

Duncan Hull and company have just published a thorough (210 references..) review of the current state of scientific digital libraries. Anyone interested in the changing face of publishing or of the Web in general would likely find the article interesting. Out of the many ideas discussed, these two caught my attention:

"As we move in biology from a focus on hypothesis-driven to data-driven science, it is increasingly recognized that databases, software models, and instrumentation are the scientific output, rather than the conventional and more discursive descriptions of experiments and their results."
and
"We suggest that the main obstacles to warmer libraries are primarily social rather than technical in nature. Identity, trust, and privacy are all potential stumbling blocks to better libraries in the future."
To the first quote, I will say simply, hear hear!  The idea that the units with which scientific progress is published and thus measured should correspond more directly to discrete, integratable chunks of knowledge and to sharable processes for knowledge generation rather than (often unparsable) stories is one whose time has clearly come. 

The second one stuck out because it is so similar to something I played a part in writing a while back.  In another review article (with a paltry 93 references) we suggested that ".. the primary hindrances to the creation of the SWLS [Semantic Web for Life Sciences] may be social rather than technological in nature ..".  In a sense, the SWLS that we were thinking about way back then could be said to subsume the digital libraries of Dr. Hull et al.'s article.  But... looking more closely at the first quote above, I realize that relation isn't subsumption, but equivalency.  Though they are writing specifically about 'libraries', they clearly consider databases, software, etc. as parts of the new incarnations of libraries and thus are writing about exactly the same thing that we were writing about.   

Its interesting that we came to similar conclusions to some extent, both articles suggest that the main challenges in moving science forward on the Web are in handling problems that have people at their center, not technology.  

2 comments:

bgood said...

So a friend recently read this post and mistakenly inferred that I was trying to say "we did it two years before you! :p ... ok, admitted, we lost the pissing game of reference padding, but who cares?!". That is really not what I was trying to convey. The article covers many different topics from ours, is more detailed, better organized, and does a clearer job of delineating the challenges we are going to face as we move forward (namely trust and identity). Neither article is anywhere near the first to point out importance of considering the requirements for human effort and commitment in the formation of the semantic Web. Peter Mika's excellent article about emergent semantics summed this up well several years ago saying; "the semantic Web is for machines, but the process of creating it and maintaing it is a social one.". And, of course, Tim Berners-Lee has been talking about the trust level of the layer cake for about two decades...

Duncan Hull said...

Hi Ben, thanks for blogging this. I thought 200+ references was a lot (and it is double the PLoS 100 reference limit). For a review though, it's often not enough (there's quite a few I left out or missed). My boss recently wrote a paper with 2000+ references, which makes 200 refs look like peanuts! It's currently published in arXiv now, but I think it will soon appear in the new journal Genome Medicine.