Saturday, April 27, 2013

Molecular Predictor Repository? (not gene set repository)

I have a simple question.  Say that I have the results from a gene expression analysis done in my laboratory or pulled from a public repository.  Say the sample has something to do with cancer (or I think that it might).  Say I read about so called 'signatures' that have been found to be associated with key phenotypes related to cancer.  (Here is a list of 13 signatures like this).

How do I now test to see which, if any, of these signatures are showing up in my sample?

I have my input, (e.g. the Affy CEL file from my experiment), how do I get the output that indicates that my sample shows an active wound response, suggests poor outcomes in breast cancer patients, looks like lung-specific metastasis, etc. etc.

This should be relatively easy, no?  I've got data about human gene expression, these people have made useful predictive models that take human gene expression as input.  Where is the website?

Some people have directed me to useful resources like GeneSigDB that provide curated repositories of "gene signatures".  However, these "signatures" are just sets of genes, they are not predictive models.  If all that we needed were gene sets, no one would ever need to train a random forest classifier or a support vector machine on the data associated with those gene sets.  Sets of phenotypically related genes are great, but I need the full predictive model.

The only system that I know of that seems to have the capacity to answer my question (had the model builders used it) is the Synapse platform.  For example, if you are good at R, you should be able to use Synapse to execute any of the models submitted to the recent breast cancer prognosis challenge.  This is a great step forward for the community (though it recapitulates pretty much everything from the more generic world of scientific workflow systems like Taverna).

But still.. a) comparatively very few published predictive models are in Synapse and b) should I really have to know R to answer that question?



Vladimir Chupakhin said...

PMML for bioinformatics )

Benjamin Good said...

Sounds awesome. Never heard of it before. Have you seen anyone using it for bioinformatics applications? (Reading about it now)

Vladimir Chupakhin said...

I am more from QSAR background, but I don't really see the difference between chemoinformatics and bioinformatics. In final it's still the data, its processing and modeling.
Take a look over there also