Apparently the FDA wants 23andme to stop selling genetic testing kits. I think this is a really bad thing.
It seems that if they could, the FDA would block access to mirrors because of the detrimental effects they might have on segments of the population that might take the data provided to them there, become unhappy, eat more cheetos and then die at a faster rate than the mirrorless...
There is no doubt that some people might look at the data provided by 23andme and related services and make poor decisions about their health and its a huge challenge to translate this kind of data into clear medical advice given what is known now. But (a) its my f'ing genome I should be able to look at it if I want to (b) they are an information service, not a healthcare service and they make that distinction very carefully - they don't say "you should take this drug.." they say "you are at greater risk for ..., so you should go talk to your doctor...". Sorry, but there needs to be some accountability on the part of the consumer.
Trying to stop personal genomics companies like this from operating until every bit of information they show has run through the FDA will only improve one thing - the economies of other countries without these kinds of problems. Not to mention the fact that without data collection strategies like this, we will likely never be able to generate the data that would allow these services to get to the point of making a major positive impact on healthcare. e.g. here is proof-of-concept paper that from 23andme that has been followed up by many new discoveries made possible by their service. http://www.ncbi.nlm.nih.gov/pubmed/21858135
Tuesday, November 26, 2013
Apparently the FDA wants 23andme to stop selling genetic testing kits. I think this is a really bad thing.
Saturday, April 27, 2013
I have a simple question. Say that I have the results from a gene expression analysis done in my laboratory or pulled from a public repository. Say the sample has something to do with cancer (or I think that it might). Say I read about so called 'signatures' that have been found to be associated with key phenotypes related to cancer. (Here is a list of 13 signatures like this).
How do I now test to see which, if any, of these signatures are showing up in my sample?
I have my input, (e.g. the Affy CEL file from my experiment), how do I get the output that indicates that my sample shows an active wound response, suggests poor outcomes in breast cancer patients, looks like lung-specific metastasis, etc. etc.
This should be relatively easy, no? I've got data about human gene expression, these people have made useful predictive models that take human gene expression as input. Where is the website?
Some people have directed me to useful resources like GeneSigDB that provide curated repositories of "gene signatures". However, these "signatures" are just sets of genes, they are not predictive models. If all that we needed were gene sets, no one would ever need to train a random forest classifier or a support vector machine on the data associated with those gene sets. Sets of phenotypically related genes are great, but I need the full predictive model.
The only system that I know of that seems to have the capacity to answer my question (had the model builders used it) is the Synapse platform. For example, if you are good at R, you should be able to use Synapse to execute any of the models submitted to the recent breast cancer prognosis challenge. This is a great step forward for the community (though it recapitulates pretty much everything from the more generic world of scientific workflow systems like Taverna).
But still.. a) comparatively very few published predictive models are in Synapse and b) should I really have to know R to answer that question?
Wednesday, December 5, 2012
Maximilian Ludvigsson took the first steps in the creation of Semantic BioGPS. BioGPS is a user-extensible Web portal that provides easy access to information about genes from hundreds of different websites. Maxmilian produced a tool that allows BioGPS users to annotate regions of gene-centric Web pages to state, computationally, what different areas of the page ‘mean’. These semantic annotations enable scripts to extract structured content about genes from these Web pages, paving the way for a new version of BioGPS that provides integrated views across multiple data sources.
Karthik G developed an interactive network visualization for the data linking genes to diseases in the GeneWiki+. The GeneWiki+ is a Semantic Media Wiki (SMW) installation that dynamically integrates data about human genes from Wikipedia and from SNPedia. While SMW queries provide a great way for programmers and advanced wiki users to interact with data, the graphical network that Karthik created gives ordinary biologists a new, intuitive, and sometimes beautiful way to explore connections between genes and disease.
Clarence Leung began the development of a new version of the crowdsourcing game Dizeez. In this new two-player game, players are challenged to get their partner to guess a particular disease by prompting them with related genes. This game follows in the tradition of ‘games with a purpose’ such as Foldit and the ESP game by producing novel, validated gene-disease associations as a result of game play.
Shivansh Srivastava worked on migrating BioGPS’s gene report layout windowing system from ExtJS to both a jQuery windowing environment and a Yahoo User Interface-based approach. This view in BioGPS provides biologists with a customizable environment for accessing gene-centric data from a diverse collection of sources. Shivansh’s efforts provided BioGPS developers with insight into the technical limitations of each solution, as compared to the current BioGPS ExtJS codebase.
Kevin Wu developed a scalable and efficient system for storing and analyzing biologically meaningful sets of genes. Accessible via a RESTful HTTP interface, the system uses MongoDB for storage and custom code for distributed computing that executes statistical comparisons across thousands of gene sets in parallel. For any particular gene set, Kevin’s code makes it possible to rapidly identify similar gene sets and to calculate the ‘enrichment’ (a statistical measure of overlap) of that gene set with respect to any other. This work will soon be integrated into BioGPS to allow users to save their own gene sets and to query for similar gene sets from others.
Thanks to all of our excellent students for their great contributions and to Google for sponsoring this unique program. We are looking forward to participating in the GSoC for many years to come!
Tuesday, November 6, 2012
I'm sitting in the main hall at the enormous Moscone conference center in San Francisco awaiting the first plenary at ASHG2012 and remembering the last time I was here. Back in spring of 2007, I saw Jeff Bezos and several other Web luminaries speak here about the Web2.0 phenomenon - what it was and how they were planning to make money on it. The buzz throughout that conference was Twitter, though I admit I hadn't really noticed it before then, did not understand it, and was very skeptical that it would amount to anything. That was the meeting that really inspired me to start writing here. 5 and a half years and 214 posts later its clear that, because of that, it was probably one of the most significant meetings in my professional career. Who knows? Perhaps this genetics business will prove even more inspirational.
Friday, November 2, 2012
Over the summer, an enterprising high school student named Nishant Mandapaty approached our research group about doing a project with us. He found us through the "Crowdsourcing Biology" group that we created for the Google Summer of Code program. To make a long story short, he has been doing good work for us ever since, quickly learning what he needs to on his own.
His primary contribution so far is a nascent game for collecting gene-disease connections called Mobianga!. He is planning to submit the results of an experiment using this game to the Intel Science Talent Search, a prestigious national science fair. But, he needs help if he is going to succeed! If you know anything about genes and their relationship to disease or are capable of using resources like OMIM, PubMed, and Google find such information he needs you to play a few games! Even better, invite several of your friends to play a few games.
Help a 17 year old computational biologist reach his dreams, play Mobianga! today!
Mobianga contest at the American Society for Human Genetics annual Meeting
Technical detailsMobianga! makes use of the human disease ontology to provide the opportunity to easily annotate genes at varying levels of granularity. For each gene challenge, you start at the top of the hierarchy (e.g. choose between 'disease of cellular proliferation' and 'disease of mental health') and you work your way down to specific diseases. At each step you earn points based on an algorithm that assesses the precision of the annotation and degree of consensus among prior players in a manner similar to the Herdit game for music tagging recently published in PNAS.
The game is implemented as a Python-powered Web Application that runs in the Google App Engine. The code is open source and he would welcome collaborators. The game is intended to eventually run smoothly on phone-sized browsers (the name Mobianga came from 'the mobile annotation game'), but this optimization has not yet been achieved. Anyone that wants to help, please get in touch.
Monday, October 29, 2012
|Building intelligent systems for biology|
As one step in testing this general hypothesis, on Sept. 7, 2012, we released a game called ‘The Cure’. The objective of this game is to build a better (more intelligent) predictor of breast cancer survival time based on gene expression and copy number variation information from tumor samples. We selected this particular objective to align with the SAGE Breast Cancer Prognosis challenge.
In this game, available at http://genegames.org/cure/, the player competes with a computer opponent to select the highest scoring set of five genes from a board containing 25 different genes. The boards are assembled in advance to include genes judged statistically ‘interesting’ using the METABRIC dataset provided for the SAGE Challenge.
Below is a game in progress. I’m on the bottom and my opponent, Barney, is on the top. We alternate turns selecting a card (a gene) from the board and adding it to our hand. When we each complete a 5 card hand, the round finishes and whoever has the most points wins. Scores are determined by using training data to automatically infer and test decision tree classifiers that predict survival time. The trees can use both RNA expression and CNV data for the selected genes to infer predictive rules. The better the gene set performs in generating predictive decision trees, the higher the score. When the player defeats their opponent, they move on to play another board. (Multiple players play each board.)
|A game of the The Cure. Barney (the bad guy) is winning, I am looking at the CPB1 gene and, using the search feature, I have highlighted all genes that have the word cancer in any of their metadata in pink.|
Promotion, players and play
|Games played at The Cure since launch|
Predicting breast cancer prognosis
- Filter out games from players that indicated no knowledge of cancer biology.
- Rank each gene according to the ratio of the number of times that it was selected by different players to the number of times that it appeared in any played game.
- Select the top 20 genes according to this ranking.
- Insert this 20 gene ‘signature’ into the ‘Attractor Metagene’ algorithm that has dominated the SAGE challenge. To do this, we kept all of the code related to the use of clinical variables unchanged, but replaced the genes selected by the Attractor team with the genes selected by our game players.
The predictor generated with this protocol scored 69% correct on survival concordance index on the Sage challenge test dataset, just 3% behind the best submitted predictor and significantly above the median of hundreds of submitted models. (You can see the ranked results on the challenge leaderboard - search for team HIVE - and, with a free registration, you can inspect the model directly within the Synapse system operated by SAGE.)
In experiments conducted within the training dataset, we were able to consistently generate decision tree predictors of 10-year survival with an accuracy of 65% in 10-fold cross-validation using only genomic data (no clinical information). This was substantially better than classifiers produced using randomly selected genes (55%). Using an exhaustive search through the top 10 genes, we found 10 different unique gene combinations that, when aggregated, produced statistically significant (FDR < 0.05) indicators of survival within: (1) the training dataset used in the game, (2) a validation cohort from the same study, and (3) an independent validation set from a completely different study.
|Final Results from METABRIC round of BCC challenge|
!! Update, the mode submitted using the The Cure data (Team HIVE) scored 0.70 on the official test dataset for the METABRIC round of this competition, putting it at #43 of of 171 submitted models !!
ConclusionsThese early results from The Cure show clearly that biologists with knowledge that is relevant to cancer biology will play scientific games, and that combined with even basic analytical techniques, meaningful knowledge for inferring predictors of disease progression can be captured from their play. We suggest that this might open the door to a new form of ‘crowdsourcing’ that operates with much smaller, more specific crowds than are typically considered.
The data collected from the game so far is available as an SQL dump in our repository. This is the entire database used to drive and track the game with the exception of personal information such as email and IP addresses.
Thanks to Max Nanis, Salvatore Loguercio, Chunlei Wu, Ian Macleod and Andrew Su for all of your help making The Cure. Thanks in particular to Max who authored 99% of everything you see when you play the game.
The opponent in The Cure came from a Wikipedia Commons image from the game "You have to Burn the Rope". Thanks for sharing!
Wednesday, October 3, 2012
Come see a fresh new interface and a more challenging set of boards in round 2 of The Cure!
Will Barney defeat us? Only you can stop him!
The SAGE challenge finishes in two weeks and we need you to play to show what The Cure can do!
Friday, September 7, 2012
I'll be writing more about the game as results start to come in. For now, I just want to heartily thank the team members who helped get this together. Because of their help, this is definitely the best product of the genegames.org initiative so far.
- Max Nanis: Restyled the entire site - bringing it up from my 1999ish blind hacking to something that looks good in 2012...
- Sal Loguercio: Hacked the R code needed to get the DREAM7 data processed and into the game.
- Chunlei Wu: Helped Sal with R and with the initial gene filtering algorithms.
- Ian Macleod: Made Barney dance!
- Andrew Su: Helped with design, concept, and algorithms throughout (and provides us all with a home with big windows..)
Tuesday, July 31, 2012
I wrote this poster abstract up for an upcoming conference and thought it might be useful to share it here. If it gets accepted, you can come see me (and my iPad stand) in person at USCD in September. If not, well, you can read it here, play the games over there, and see me virtually anywhere.
genegames.org: High-throughput access to biological knowledge and reasoning through online games
Games are emerging as a powerful organizational and motivational tactic throughout many areas of society. Wherever people have a goal that they are having trouble reaching, be it getting their chores done , learning all the functions of Microsoft Visual studio , or finishing a 10K , many are finding success by posing the required tasks as elements of games. Games can turn small units of work, that alone might seem boring, into fun steps taken towards a meaningful success. In doing so, they can sometimes dramatically increase individuals’ chances of reaching their objectives. The process of translating elements of non-game contexts (e.g. most traditional work, learning, exercise, etc.) into aspects of games is now known as ‘gamification’.
Gamification is now being used to meet a variety of scientific goals by serving as an effective way to organize and incentivise large-scale volunteer labor. The protein-folding game Foldit was the first of a growing wave of applications of games in the context of biological research. From this well-publicized  beginning in protein structure, we now have a variety of biological games about, for example, RNA structure design , multiple sequence alignment , and neural connectivity mapping . In these games, players help advance scientific objectives by performing tasks that can not be completed successfully by computers alone.
At genegames.org we are exploring the use of games to access the knowledge and reasoning abilities of biologist players. Through the gene annotation game ‘GenESP’, players can contribute their knowledge of gene function and disease relevance to a new kind of public gene annotation database. In the ‘COMBO’ game, players help to identify biomarker gene sets that can be used to improve predictions of various complex human phenotypes. The poster will provide details about the design of the games as well as preliminary results from ongoing experiments. In addition, the game prototypes will run live during the conference allowing attendees to play and provide the developers with important feedback.
Friday, July 20, 2012
This year's 20th annual conference on Intelligent Systems for Molecular Biology (ISMB) was a busy one for me and the rest of the Su Lab. As a group of 5, we were responsible for 4 oral presentations, three posters, and the administration of one special session. Keeping that all together while catching up with old and new friends was a fun, though exhausting experience. And nevermind trying to follow the ISMB twitter stream!
Very briefly, we presented as follows:
- Chunlei Wu on BioGPS - slides, poster
- Erik Clarke on a Task-based evaluation of the Gene Ontology and the human annotations (which one best paper for the Bio-Ontologies session!)
- Salvatore Loguercio on Games for gene annotation - poster, presentation, games
- Myself on a new game for building better class predictors - poster, game
- Andrew Su on Crowdsourcing human gene annotation with the Gene Wiki.
Of all the very impressive and interesting presentations, Alex Pico's stood out for me. To keep this short, I'll leave the specific recap on the other projects to your Googling and finish this with a couple thoughts that percolated from Dr. Pico's perceptive presentation.
A Pico LessonWikiPathways is doing great right now. They have a very rapidly growing user base and are on their way to becoming the de facto standard resource for pathway information. Of particular interest is that more than 20% of its registered users have made edits to pathways. That is an astoundingly high rate of user-to-editor conversion. We claim great success with the Gene Wiki and I am fairly certain that our ratio is less than 1% (though its difficult to tell exactly as the definition of 'user' is fuzzier).
So, how are they making it work? One of the things that Alex emphasized in his presentation was that WikiPathways was created to solve their own problems in collaboratively editing and sharing pathways (as part of the GenMapp project). It would have been useful and used (by them) even if no one outside of their research group ever got involved. The fact that it has been taken up by a broader community is a very valuable, but secondary effect. This basic idea was echoed in the twitter echoes of Carole Goble's talk (I missed hearing it directly) and resonates yet again with the inescapable Del.icio.us lesson. Personal value precedes network value.
As we forge ahead into the realm of Community Intelligence, we need to keep that lesson foremost in our minds. When we are thinking about games, that means that the game actually has to be fun, really fun! Time will tell if we can cross that threshold...