Editors' ChoiceBiostatistics

Next-Generation Sequencing: A Better Way

See allHide authors and affiliations

Science Translational Medicine  15 Sep 2010:
Vol. 2, Issue 49, pp. 49ec143
DOI: 10.1126/scitranslmed.3001681

New research technology can lead to new scientific insight—as long as it is not prohibitively expensive. Use of the best design parameters can help control costs. One such technology is next-generation sequencing, which can help to decipher the role of rare genetic variants in complex diseases. In this approach, each base pair is identified, in contrast to single-nucleotide polymorphism analysis, in which base pairs are determined only at a set of locations where common variations occur in the population. Despite the possible gains in information from sequencing each base pair, next-generation sequencing is still too expensive to use in full genome-wide association studies (GWASs). To make costs reasonable, investigators pool samples from multiple individuals, choose on average a minimum number of “reads” for each site on the genome (depth), and sequence only the regions that code for protein formation (exons). The influence of these choices on a study’s ability to find genetic effects is not understood, however. Recently, Kim et al. (2010) have offered some clarity on this issue. They show, using experimental and computer-generated data, that when using their modified likelihood ratio test, the chance of identifying a genetic effect when it exists is higher when sequencing a larger number of subjects at a shallower depth with a larger pooling of subjects (~five to eight subjects) than when sequencing a smaller number of subjects at a higher depth with no pooling. When sequencing the individual, if the sequencing error rate is reasonable (1% or less), then there is little to gain by increasing the depth of sequencing past two. This approach can be applied to any disease or condition in which currently unknown genetic variants are of interest.

S. Y. Kim et al. Design of association studies with pooled or un-pooled next-generation sequencing data. Genetic Epidemiol. 34, 479–491 (2010). [Abstract]

Navigate This Article