Tuesday, April 6, 2010

Jerzy Lewak at SDSW Meetup

Two tall metal file cabinets for work or home useImage via Wikipedia

I just got back from another interesting San Diego Semantic Web Meetup, here are my notes before I forget.

The presenter this evening was Jerzy Lewak, a former theoretical physicist, a professor emeritus at UCSD, and cofounder of (at least) Nisus Software and SpeedTrack Inc.. (Once again, I continue to be impressed at the level of the speakers that are recruited for these events!). Jerzy presented the history and current applications of a novel human interface for databases that he calls GIA (Guided Information Access) that is made possible by the underlying TIE (Technology for Information Engineering) framework.

Jerzy began his talk as a true computer scientist by defining 'The Problem' that originally inspired this work (back in 1991). At that time email was just starting to pick up steam and already the task of finding old emails was becoming unmanageable. So, he set out to find a better way. (Always nice to be working on a solving a problem for yourself.) His search for a solution took a decidedly familiar path:
  1. Hmm, why don't I set up some nice hierarchies of concepts to place my emails into so it will be easier to find them later?
  2. Darn.. that really isn't working out very well. The more data I get, the harder it is to organize and I keep running into the problem that almost every single item in my collection might be placed under more than one category. Perhaps a physical filing cabinet is actually a terrible thing to base a completely virtual information storage and retrieval system on... (Though he didn't bring it up, he was describing exactly the same thing that Clay Shirky got so excited about in the 'Ontology is Overrated' essay in 2005 that - in turn - got me all excited in 2006, except of course Jerzy was thinking in 1991. And, as pointed out to me by my LIS friend Joe, this basic problem and the following conclusion were pretty well fleshed out by Ranganathan in the 1930's...).
  3. After experimenting with plain content search (a la Google) he arrived at faceted classification as the most powerful and flexible way to describe and access data.
So far so good, I am paying attention.

Now the problem arises that many combinations of facets actually produce zero results. As he noted, just 200 facets (he uses the word 'selectors') is enough to uniquely describe every particle in the universe. This became the real problem. The breakthrough that got him going and has led to SpeedTrack and everything else he presented was the idea of dynamically limiting the potential facets based on those that are already selected. In the interfaces he demonstrated, he would:
  1. Choose some database field like 'last name' and type in a name like 'Smith'
  2. Show two things - one, the number of results went down and two, the number of possible values for the other fields (e.g. 'first name', 'height', 'date', etc.) would immediately be constrained to only show values for objects linked to Smiths in the database.
His interfaces might be described as an advanced multi-parameter type-ahead. They work by guiding the user (GIA) in the creation of potentially very complex queries that are guaranteed to return results. This is achieved by dynamically exposing the underlying indexes that drive boolean queries.

The system works on both structured and unstructured data (he showed a quick example of newspaper articles) but requires fairly heavy manual labor to get it started. Overall it looked like it would be useful and fun to use and I can imagine many potential directions they could take it.

My only complaint for the talk was that there was absolutely zero Web in it - no mention of any native ability to consume or produce RDF, no mention of OWL, no discussion of scaling possibilities, and the proverbial elephant in the room of large-scale data access was left more or less untouched. I guess that might be one sign of a good talk - I got interested and it left me thirsting for more..
Reblog this post [with Zemanta]


Mark Wilkinson said...

This is very similar to the idea we're chasing now v.v. using OntoLoki to guide ontology development... in fact... almost identical :-)

we must be on the right track!! LOL!