Monday, April 23, 2012

Gene Wiki SPARQL endpoint

Thanks to Leyla and Alex Garcia-Castro from UniProt and Florida State University respectively, we now have access to a SPARQL endpoint for the data in the Gene Wiki.  Access it live here:
http://virtuoso.idiginfo.org/sparql
(update on 4-28-12 , that is down and a live one is currently available at
http://199.102.237.69:8890/sparql
)

Here is one query that you might like to try that finds gene-disease links that we have mined from the text:

PREFIX wiki: <http://genewikiplus.org/wiki/Special:URIResolver/>
PREFIX property: <http://genewikiplus.org/wiki/Special:URIResolver/Property-3A> 
SELECT ?gene ?disease ?gene_name ?disease_name ?doid
WHERE {
 ?gene property:Is_associated_with_disease ?disease .
 ?gene property:HasSNP ?snp .
 ?snp property:Is_associated_with_disease ?disease .
 ?gene rdfs:label ?gene_name .
 ?disease rdfs:label ?disease_name .
 ?disease rdf:type ?disease_cat .
 ?disease_cat property:HasDOID ?doid .
 ?gene rdf:type wiki:Category-3AHuman_proteins .
}

How it works in brief

  1. Articles from the Gene Wiki and from SNPedia are transferred to genewikiplus.org
  2. As they go in, they are converted into a semi-structured form that enables queries in semantic media wiki.
  3. We dump the entire thing out as one giant RDF file.
  4. Leyla loads the RDF into their Virtuoso server (and performs some enhancements such as linking directly to UniProt RDF).
  5. and wa la!
(More details about the generation of the genewiki+ are available in this soon-to-be-published paper about the SNPedia mashup and this paper about Semantic Wiki Links in Wikipedia.)

Cool next steps

The RDF has OWL:sameAs links between all the Gene Wiki entries and their RDF equivalents in DBpedia and in UniProt's RDF representation.  It should be possible to explore connections that span these three (four including SNPedia) resources using Linked Data technologies like Virtuoso's Sponger.

Go forth! Play with our data!




3 comments:

Rob said...
This comment has been removed by the author.
Rob said...

Blogger stips the PREFIX tags. I know that you included them when you wrote the post, but for anybody reading this post and wondering why the example query doesn't seem to work, try they query at https://gist.github.com/2474895

Benjamin Good said...

Thanks for the catch Rob! I've made the prefixes visible in the post.