Wednesday, December 5, 2012

GSoC recap for Crowdsourcing Biology team at TSRI

(As presented on the Google Open Source Blog)
The Crowdsourcing Biology team at the Scripps Research Institute participated in the Google Summer of Code for the first time this year.  Five students contributed to efforts to harness the power of community intelligence to advance biomedical science.

Maximilian Ludvigsson took the first steps in the creation of Semantic BioGPS.  BioGPS is a user-extensible Web portal that provides easy access to information about genes from hundreds of different websites.  Maxmilian produced a tool that allows BioGPS users to annotate regions of gene-centric Web pages to state, computationally, what different areas of the page ‘mean’.  These semantic annotations enable scripts to extract structured content about genes from these Web pages, paving the way for a new version of BioGPS that provides integrated views across multiple data sources.

Karthik G developed an interactive network visualization for the data linking genes to diseases in the GeneWiki+.  The GeneWiki+ is a Semantic Media Wiki (SMW) installation that dynamically integrates data about human genes from Wikipedia and from SNPedia.  While SMW queries provide a great way for programmers and advanced wiki users to interact with data, the graphical network that Karthik created gives ordinary biologists a new, intuitive, and sometimes beautiful way to explore connections between genes and disease.

Clarence Leung began the development of a new version of the crowdsourcing game Dizeez.  In this new two-player game, players are challenged to get their partner to guess a particular disease by prompting them with related genes.  This game follows in the tradition of ‘games with a purpose’ such as Foldit and the ESP game by producing novel, validated gene-disease associations as a result of game play.  

Shivansh Srivastava worked on migrating BioGPS’s gene report layout windowing system from ExtJS to both a jQuery windowing environment and a Yahoo User Interface-based approach.  This view in BioGPS provides biologists with a customizable environment for accessing gene-centric data from a diverse collection of sources.  Shivansh’s efforts provided BioGPS developers with insight into the technical limitations of each solution, as compared to the current BioGPS ExtJS codebase.

Kevin Wu developed a scalable and efficient system for storing and analyzing biologically meaningful sets of genes.  Accessible via a RESTful HTTP interface, the system uses MongoDB for storage and custom code for distributed computing that executes statistical comparisons across thousands of gene sets in parallel.  For any particular gene set, Kevin’s code makes it possible to rapidly identify similar gene sets and to calculate the ‘enrichment’ (a statistical measure of overlap) of that gene set with respect to any other.  This work will soon be integrated into BioGPS to allow users to save their own gene sets and to query for similar gene sets from others.

Thanks to all of our excellent students for their great contributions and to Google for sponsoring this unique program.  We are looking forward to participating in the GSoC for many years to come!