Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer

Science Translational Medicine  17 Apr 2013:
Vol. 5, Issue 181, pp. 181re1
DOI: 10.1126/scitranslmed.3006112

You are currently viewing the editor's summary.

View Full Text

DREAMing of Biomedicine’s Future

Although they no longer live in the lab, scientific editors still enjoy doing experiments. The simultaneous publication of two unusual papers offered Science Translational Medicine’s editors the chance to conduct an investigation into peer-review processes for competition-based crowdsourcing studies designed to address problems in biomedicine. In a Report by Margolin et al. (which was peer-reviewed in the traditional way), organizers of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge (BCC) describe the contest’s conception, execution, and insights derived from its outcome. In the companion Research Article, Cheng et al. outline the development of the prognostic computational model that won the Challenge. In this experiment in scientific publishing, the rigor of the Challenge design and scoring process formed the basis for a new style of publication peer review.

DREAM—Dialogue for Reverse Engineering Assessments and Methods—conducts a variety of computational Challenges with the goal of catalyzing the “interaction between theory and experiment, specifically in the area of cellular network inference and quantitative model building in systems biology.” Previous Challenges involved, for example, modeling of protein-protein interactions for binding domains and peptides and the specificity of transcription factor binding. In the BCC—which was a step in the translational direction—participants competed to create an algorithm that could predict, more accurately than current benchmarks, the prognosis of breast cancer patients from clinical information (age, tumor size, histological grade), genome-scale tumor mRNA expression data, and DNA copy number data. Participants were given Web access to such data for 1981 women diagnosed with breast cancer and used it to train computational models that were then submitted to a common, open-access computational platform as re-runnable source code. The predictive value of each model was assessed in real-time by calculating a concordance index (CI) of predicted death risks compared to overall survival in a held-out data set, and CIs were posted on a public leaderboard.

The winner of the Challenge was ultimately determined when a select group of top models were validated in a new breast cancer data set. The winning model, described by Cheng et al., was based on sets of genes (signatures)—called attractor metagenes—that the same research group had previously shown to be associated, in various ways, with multiple cancer types. Starting with these gene sets and some other clinical and molecular features, the team modeled various feature combinations, selecting ones that improved performance of their prognostic model until they ultimately fashioned the winning algorithm.

Before the BCC was initiated, Challenge organizers approached Science Translational Medicine about the possibility of publishing a Research Article that described the winning model. The Challenge prize would be a scholarly publication—a form of “academic currency.” The editors pondered whether winning the Challenge, with its built-in transparency and check on model reproducibility, would be sufficient evidence in support of the model’s validity to substitute for traditional peer review. Because the specific conditions of a Challenge are critical in determining the meaningfulness of the outcome, the editors felt it was not. Thus, they chose to arrange for peer-reviewers, chosen by the editors, to be embedded within the challenge process, as members of the organizing team—a so-called Challenge-assisted review. The editors also helped to develop criteria for determining the winning model, and if the criteria were not met, there would have been no winner—and no publication. Last, the manuscript was subjected to advisory peer review after it was submitted to the journal.

So what new knowledge was gained about reviewing an article in which the result is an active piece of software? Reviewing such a model required that referees have access to the data and platform used for the Challenge and have the ability to re-run each participant’s code; in the context of the BCC, this requirement was easily achievable, because Challenge-partner Sage Bionetworks had created a platform (Synapse) with this goal in mind. In fact, both the training and validation datasets for the BCC are available to readers via links into Synapse (for a six month period of time). In general, this requirement should not be an obstacle, as there are code-hosting sites such as GitHub and that can accommodate data sharing. Mechanisms for confidentiality would need to be built into any computational platform to be used for peer review. Finally, because different conventions are used in divergent scientific fields, communicating the science to an interdisciplinary audience is not a trivial endeavor.

The architecture of the Challenge itself is critical in determining the real-world importance of the result. The question to be investigated must be framed so as to capture a significant outcome. In the BCC, participants’ models had to score better than a set of 60 different prognostic models developed by a team of expert programmers during a Challenge precompetition as well as a previously described first-generation 70-gene risk predictor. Thus, the result may or may not be superior to existing gene expression profiling tests used in clinical practice. This remains to be tested.

It also remains to be seen whether prize-based crowdsourcing contests can make varied and practical contributions in the clinic. Indeed, DREAM and Sage Bionetworks have immediate plans to collaborate on new clinically relevant Challenges. But there is no doubt that the approach has value in solving big-data problems. For example, in a recent contest, non-immunologists generated a method for annotating the complex genome sequence of the antibody repertoire when the contest organizers translated the problem into generic language. In the BCC, the Challenge winners used a mathematical approach to identify biological modules that might, with continued investigation, teach us something about cancer biology. These examples support the notion that harnessing the expertise of contestants outside of traditional biological disciplines may be a powerful way to accelerate the translation of biomedical science to the clinic.