Editors' ChoiceCancer Genomics

Moving from Unknown Unknown to Known Unknown

See allHide authors and affiliations

Science Translational Medicine  29 Jan 2014:
Vol. 6, Issue 221, pp. 221ec20
DOI: 10.1126/scitranslmed.3008434

The post-genome era was heralded as one that would provide answers to the mysteries underlying deadly diseases. However, the results thus far seem to be an ever-expanding set of questions: What is the role of the vast majority of functionally uncharacterized DNA? How are gene products organized into higher-level functional networks that govern cellular phenotypes? How do the microenvironment and microbiomes influence cellular regulation?

The mere ability to define such questions is a breakthrough. In the pre-genome era, such questions resided in the domain of the “unknown unknown”—we didn’t know what we didn’t know. Now, to a large extent, we know what we don’t know. The key questions are “known unknowns,” and we thus can seek the answers. Now, Lawrence et al. attempt to define the number of genes that are recurrently mutated in cancer and estimate the sizes of data sets necessary to identify them all.

The authors analyzed whole-exome sequencing data from tumor-normal pairs of 4742 tissue samples spanning 21 cancer types. A total of 254 different genes were substantially mutated in at least one tumor type or when considering all tumor types together, based on criteria that included mutation frequency above background, clustering of mutations within the gene, and evolutionary conservation of mutated sites. Out of 82 genes previously reported to be mutated in the analyzed tumor types, 60 (73%) were recovered with the de novo analysis. Many of the previously unreported genes were associated with cancer-related processes, such as proliferation, apoptosis, genome stability, chromatin regulation, immune response, and RNA processing.

On the basis of down-sampling analysis, the number of substantially mutated genes continued to increase linearly up to the maximum sample size profiled for each tumor type, indicating that the full repertoire of mutated genes is far from being characterized by using current data sets. Specifically, most genes mutated in >20% of samples appeared to be identified, whereas many genes with lower mutation rates required larger sample sizes. The authors estimated that developing a catalog of genes mutated at >2% frequency would require profiling between 650 and 5300 samples per tumor type, depending on the background mutation rate, corresponding to roughly 100,000 samples required to characterize 50 tumor types.

Following the theme of modern biomedical research, the current study perhaps creates more questions than answers. The oncogenic impact of many recurrently mutated genes remains to be characterized, and the cumulative importance of multiple low-frequency mutations remains unclear. Further research is also needed to understand to what extent low-frequency mutations correspond to tumor molecular subtypes with different therapeutic and prognostic implications. For now, the ability to ask such questions gives us hope of finding their answers.

M. S. Lawrence et al., Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014). [Abstract]

Navigate This Article