I've recently returned from a trip to the Leiden University Medical Center where I met with Barend Mons, Marco Roos, and Eric Schultes regarding the formation of what Barend is calling the Concept Web Alliance (CWA) (as well as some open positions within his groups at the LUMC and the University of Rotterdam).
In preparing for my presentation I read the WikiProteins article again (from Barend's group) and was struck with the similarity of the pattern I saw there with some of my old work and things that are emerging right now at Freebase and elsewhere. It seems that there is a very clear uptake of the following basic cycle in the context of building large knowledge bases on the Web.
- Datamining (from text or other sources) generates many many candidate assertions
- The assertions are presented to (many) people who manually correct/refine them
A somewhat more hidden indicator of this trend is that Dolores Labs - a company that seems to be entirely devoted to the application of Amazon's Mechanical Turk - is now hiring. Its hard to tell what they are doing internally, but it would be surprising if the interplay between datamining and the wisdom of crowds didn't play a strong role.
These trends will place increasing pressure on the scientific/tech community to come to a better understanding of the processes involved in motivating, coordinating, and aggregating the knowledge of many millions of minds in the formation of structured knowledge.