Wednesday, January 23, 2008

centralized content and decentralized control

No time for depth of thought here, just wanted to quickly jot down some current observations and recent links to ideas related to the future of the semantic Web.

In chronological order
  1. Early January 2008, the annual NAR database issue comes out listing more than 1000 distinct databases in molecular biology.
  2. Shortly after, Duncan Hull complains that essentially none of the "dark data" residing in these databases will ever be used because of the near complete lack of both syntactic and semantic interoperability between these isolated, decentralized silos.
  3. I participate in writing up a paper about the Banff Manifesto, which, among other things seeks to improve cross-database interoperability through the introduction of a single, open, centralized, resource for defining public namespaces for use in the construction of unique identifiers.
  4. I am forwarded a link to a Wired blog post about Google base - Google providing a centralized repository for scientific data.
  5. the Freebase dev blog quotes an article about Wine tasting websites that indicates that Freebase is an example of the semantic Web (which they also refer to as Web3.0) and that everyone that creates a community-directed database should be using Freebase to do so - because of the dramatic advantages of centralization.
Notice anything interesting here?

It seems that many people think that success in achieving the goals of the semantic Web is really all about centralization of content. This of course, seems to smack against the decentralized roots of the first Web and is, in that way, rather disappointing. On the other hand, it also seems that success on the semantic Web is really all about decentralization of content control - which was, IMHO, the fundamental characteristic of the first Web that made it such a success (providing it with the coveted capital letter).

Is it possible to achieve a useful giant global graph without centralizing content?

Not as far as I can tell right now.

In fact, the current Web would hardly be very useful without massive efforts to centralize its content. How useful would the Web be if Google and others stopped downloading it and indexing its content on their servers, thus depriving us all of our god given right to find out anything about anything in milliseconds??