Tuesday, March 17, 2009

tightening the knowledge cycle

I've recently returned from a trip to the Leiden University Medical Center where I met with Barend Mons, Marco Roos, and Eric Schultes regarding the formation of what Barend is calling the Concept Web Alliance (CWA) (as well as some open positions within his groups at the LUMC and the University of Rotterdam).

In preparing for my presentation I read the WikiProteins article again (from Barend's group) and was struck with the similarity of the pattern I saw there with some of my old work and things that are emerging right now at Freebase and elsewhere. It seems that there is a very clear uptake of the following basic cycle in the context of building large knowledge bases on the Web.

  1. Datamining (from text or other sources) generates many many candidate assertions
  2. The assertions are presented to (many) people who manually correct/refine them
These two basic steps are being united everywhere right now. In WikiProteins we see natural language processing techniques seeding a semantic protein Wiki where people (presumably scientists) can correct/extend the predictions. In Google search we see the familiar products of datamining in the ranked result lists but these are now coupled with interfaces that allow users to 'vote-up' specific results. Freebase, which makes extensive use of automated knowledge acquisition techniques, is now unveiling a series of 'games with a purpose' like TypeWriter and Genderizer that are being used to validate the predictions of their algorithms (see a nice post about the Freebase process).

A somewhat more hidden indicator of this trend is that Dolores Labs - a company that seems to be entirely devoted to the application of Amazon's Mechanical Turk - is now hiring. Its hard to tell what they are doing internally, but it would be surprising if the interplay between datamining and the wisdom of crowds didn't play a strong role.

These trends will place increasing pressure on the scientific/tech community to come to a better understanding of the processes involved in motivating, coordinating, and aggregating the knowledge of many millions of minds in the formation of structured knowledge.

blog comments powered by Disqus