Monday, May 11, 2009

CWA at the YMCA

Somewhere high in the air between New York and Minneapolis, my first stop on my way home, I feel compelled to explain a few things to myself.  Why on Earth have I just spent the last several nights living in the YMCA in Flushing, New York?  Why did I decide to go on my first self-funded professional excursion at a time when I have no income and very little savings?  What did I hope to get and what did the trip deliver? 

The inspiration for this minor adventure was the inaugural meeting of the Concept Web Alliance (CWA) at the New York Hall of Science.  The mission statement of the CWA (written partly at this meeting) is as follows:

To enable an open, collaborative environment to jointly address the challenges associated with high volume scholarly and professional data production, storage, interoperability, and analyses for knowledge discovery

The idea is to form an alliance of like-minded researchers and science publishers interested in sharing knowledge in a computationally accessible fashion (i.e. not plain text and such that information from multiple sources can easily be integrated and interacted with).  The basic building block envisioned for these efforts is the ‘triple’ – a Concept-Relation-Concept structure.  (The word ‘triple’ and the interesting new verb ‘triplification’ - meaning to convert some non-triple-structure like text into a set of triples - were almost certainly the most commonly uttered words in the presentations at the meeting.) 

For those familiar with semantic Web standards such as RDF (a generic triple-based language for representing and sharing information) and OWL (a set of languages for representing knowledge in the form of ontologies) it is perhaps most interesting to consider what is not present in a CWA triple and what was not discussed at all in the public portions of the meeting.  The following words never came up ‘description logic’, ‘axiom’, ‘class’, ‘reality’.

The intended materialization of the Concept Web vision - at the moment - thus seems to be an open collection of informal (non logic-based) concept representations, identified by URIs retrievable on the Web, that can be linked together to form semantic networks.  This graph-structure could be queried (e.g. using SPARQL) for the ‘facts’ that it would contain where each such fact would be linked to extensive information about where it came from (who (or what algorithm) suggested it, when, and with what confidence).  Interestingly, this is very similar in its flexibility, lack of built-in reasoning, and its strong notion of provenance tracking to the Freebase model. 

While some of you who like to work with reasoners and OWL or who think that it is better to talk about ‘universals’ and ‘particulars’ than it is to talk about ‘concepts’ may find this lack of formality a little disappointing, I am growing more and more enthusiastic about it because fits the publish-then-filter nature of the Web perfectly.  We see again and again that once information is out there on the Web, its value increases tremendously.  (In fact, many very smart people seem to think that when there is enough text and other unstructured data online that is all we will really need to solve most of our information problems.) By providing a very low barrier for entry and then focusing computer science efforts on handling the noise, complexity, and the conflicts that will inevitably arise (the filter part), I think this triple-publishing approach has great potential to push research in a productive direction.  In particular, I think it will push people to spend more time working on other, more flexible modes of inference that don’t just die when a logical conflict is detected - all that squishy probability stuff that the semantic Web has managed to ignore for so long and that happens to be the stuff that makes almost all interesting AI-like technology work now.  Furthermore, the triple-focus absolutely does not stop groups that participate in the CWA from making use of approaches grounded in formal logics in their own development. 

While there may or may not be good reasons to take one philosophical stance over another when creating knowledge bases, the fact will always remain that there will be conflicts of opinion about this.  When dealing with a small group, it may be possible to convince or force acceptance of a particular world view, but it is not IMHO going to be possible to enforce something as arguable as the philosophy of the representation of the nature of being on the scale of the Web.  By focusing on the smallest possible units, the triples, and leaving the more precise formalizations and the philosophy out of the vision as much as possible, the CWA might make it possible for a diverse, interoperable ecology of knowledge bases to emerge and co-exist.  Ideally, those who wish to make use of, for example - description logic reasoning, should be able to benefit from the pool of URIs in the Concept Web if for nothing other than for the many multi-lingual labels and textual definitions that will be associated with each of them. 

It is still very early days for the CWA – probably far too early to really speculate too far about the consequences of its basic technological approach as even this approach is still very much up for debate.  Still..  I’m not sure exactly how to express this, but the meeting smelled good.  There were enough capable, powerful, enthusiastic people together in that room that seemed to have enough of a shared vision that I think it is very likely that something of that vision is likely to come to life.

So, was it worth it?  I think that it was in the end.  I got an early look at something that might provide solutions to many of the problems that I’ve spent the past several years of my life thinking about (the social construction of a biosemantic web).  I got to reconnect with old friends and make some new ones.  I had a chance to see New York for the first time (the scale of which blew my mind).  And, last but not least, it just might be the last such academic event I get to take part in.  Depending on the choices I make and the dictations of the wheels of fate I may be in the process of losing the privilege of working in the ivory tower.  If it was indeed my goodbye to the community of scholars, it was a good one.

So yes, it was a worthwhile trip and it remains an exciting time to be thinking about the Web - concept or otherwise.  (and the YMCA wasn’t really so bad in the end ;).

You can follow - and perhaps influence - the evolution of the Concept Web on their blog.

blog comments powered by Disqus