Editors' ChoiceBiostatistics

Making Research Reproducible

See allHide authors and affiliations

Science Translational Medicine  11 Aug 2010:
Vol. 2, Issue 44, pp. 44ec124
DOI: 10.1126/scitranslmed.3001525

Most researchers have suffered through the following challenge: A new analytic approach is published that seems applicable to a scientist’s own research. Thus, that scientist wants to apply the approach and, if it is a data analysis method, first tries to replicate the work in the publication. However, the paper doesn’t explain every step of the analysis, nor does it specify exactly how the final data set was created. Thus, a slightly different answer is obtained, creating questions about how to implement the approach.

The solution to this challenge has been coined “reproducible research” and is being discussed in many areas of study. Recently, this topic hit the forefront of biostatistical research by being covered in a sequence of articles in the July 2010 issue of Biostatistics. The exposition discusses the issues behind reproducible research for statistical work, focusing on the “how” and “why” of statistical analysis. The “how” describes making computer code and analysis data available for the reader to reproduce a result, fully understand the implementation, or investigate the analysis approach under different conditions. The second issue, the “why,” is far more complex because it covers reproducing all of the discussions that were brought into the design and data analysis plan. In most collaborations, the (bio)statistician or data analyst works closely with the scientific team and translates that scientific knowledge into the study design and analysis plan. The content of all of those discussions is not easily written down but is often reproducible in other ways, such as verbal conversation.

The discussion in Biostatistics highlights the major issues that researchers face in trying to make their work more accessible. Although written for biostatisticians, the points are important for the nonbiostatistician because adopting reproducible research guidelines will require a change in how data analysis is documented and in which products (such as documented computer code) must be made widely available for all analyses upon publication.

Commentaries, Biostatistics 11, 376–390 (2010). [Table of Contents]

Navigate This Article