Monday, July 11, 2011

Gene Wiki Rainbow

Last summer I posted an image of the Gene Wiki hyperlink network, aptly titled "the gene wiki hairball".  The image was picked up by noted artist/scientist Martin Krzywinski and used as an example of why hairballs are a terrible visualization.  Perhaps out of guilt for making an example out of us and/or perhaps out of interest he has helped us improve our thinking about how to visualize networks substantially.  Here is a Circos view of the top 100 genes in the gene wiki, the editors that created the articles, and the diseases and compounds that the genes are linked to.  It will be presented as a guerilla poster[1] at ISMB this year so please stop by and have a closer look!

Aside from bringing an artistic aesthetic to scientific illustrations, one of Martin's main contributions (IHOP) is that he understands and uses space to convey meaning effectively.  In a hairball, the only consideration of space in the layout algorithm is to reveal as many nodes as possible.  In a Circos diagram, place can be attached to semantics.  This fundamental idea is used to an even greater extent in Martin's latest layout invention, Hive Plots.  Watch out for the Gene Wiki Hive Mind visualization - or better yet, write to us and help us build it!

[1] guer·ril·la post·er /gəˈrilə/ /ˈpōstər/

Noun: An uninvited poster displayed at a scientific conference.
Posted by Picasa


Boghog said...

Fascinating representation of the Gene Wiki! It is striking that some editors are so focused while others are more eclectic.

I am very interested in target/drug/disease connections. Would it be possible to create another representation where the editors are removed and the drugs are segregated from the diseases so that connections are drawn between the two? Also it appears that targets are listed is alphabetical order. Would it be possible arrange these targets into families based on sequence similarity?

Benjamin Good said...

Thanks Boghog, glad you like it! It would certainly be possible to make the illustrations you describe - it would just take some work. It may take some time before we get around to doing it like that. If you are keen to see this soon, I would be happy to provide you with the data needed. Circos is open source and, in principle, you could get it running and experiment with different views as you describe. I have to warn you though that is a challening installation and has a pretty steep learning curve. For this one, Martin made it for us himself...

Anonymous said...

Hi, How are these genes 'the top 100' ones?

Benjamin Good said...

Like any 'top 100' list, its pretty arbitrary! These are simply the genes with the most outgoing wikilinks to other page on Wikipedia. Generally speaking, this indicates that they are well developed articles and, generally speaking, that indicates that they are well-studied genes.