See allHide authors and affiliations

Science Translational Medicine  21 Dec 2016:
Vol. 8, Issue 370, pp. 370ec203
DOI: 10.1126/scitranslmed.aal3701

Sequencing costs have fallen dramatically. We now assay numerous processes through diverse experimental strategies that, at some stage, sequence nucleic acids. To fully benefit from inexpensive sequencing, analysis costs need to drop in parallel, and results must be returned quickly. There is a project management rule stating that among fast, good, and cheap, we can choose any two. Tatlow and Piccolo show, however, that we can have all three when quantifying gene expression from RNA sequencing (RNA-seq) data. The authors designed a modern cloud-based RNA-seq workflow, measured computing requirements and costs, and optimized the process. After optimization, they scaled the analysis to 11,373 TCGA samples, each of which was completed in about 100 minutes and cost less than 10 cents. The authors took specific steps to control costs.

First, they analyzed data that were already stored in the cloud, eliminating transfer costs and time. Various NIH initiatives are beginning to make data available and computable in commercial cloud environments. The authors’ work demonstrates the importance of such initiatives. To ensure that their methods can be used in other settings, it will be critically important that investigator-contributed datasets are similarly available. One can imagine a service where samples uploaded by investigators to the Short Read Archive or Gene Expression Omnibus are made available in an encrypted form during an embargo period. Within this time, contributing investigators could perform their own analyses. After the embargo, decrypted data would be available to all researchers.

Second, the authors worked with otherwise underused Google Cloud Platform resources at a discount. Google offers these as “preemptible” virtual machines, and Amazon offers “spot” instances. These machines are occasionally reclaimed by Google or Amazon, interrupting the analysis. Because the authors designed a repeatable workflow, they easily restarted any interrupted jobs. This alone reduced overall costs by half.

This work focused on RNA expression quantification. Numerous workflows remain to be similarly optimized. However, optimization of academic workflows using techniques common in industry can dramatically lower analytical costs. This work lays the technological groundwork to follow cheap sequencing with cheap analysis, a necessary step to attaining reasonably-priced precision medicine.

P. J. Tatlow, S. R. Piccolo, A cloud-based workflow to quantify transcript-expression levels in public cancer compendia. Sci. Rep. 6, 39259 (2016). [Full Text]

Stay Connected to Science Translational Medicine

Navigate This Article