Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer.

TitleEvaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer.
Publication TypeJournal Article
Year of Publication2008
AuthorsBashir A, Volik S, Collins C, Bafna V, Raphael BJ
JournalPLoS Comput Biol
Volume4
Issue4
Paginatione1000051
Date Published2008 Apr
ISSN1553-7358
KeywordsAlgorithms, Base Sequence, Breast Neoplasms, Chromosome Mapping, Female, Gene Rearrangement, Humans, Molecular Sequence Data, Sequence Analysis, DNA
Abstract

Paired-end sequencing is emerging as a key technique for assessing genome rearrangements and structural variation on a genome-wide scale. This technique is particularly useful for detecting copy-neutral rearrangements, such as inversions and translocations, which are common in cancer and can produce novel fusion genes. We address the question of how much sequencing is required to detect rearrangement breakpoints and to localize them precisely using both theoretical models and simulation. We derive a formula for the probability that a fusion gene exists in a cancer genome given a collection of paired-end sequences from this genome. We use this formula to compute fusion gene probabilities in several breast cancer samples, and we find that we are able to accurately predict fusion genes in these samples with a relatively small number of fragments of large size. We further demonstrate how the ability to detect fusion genes depends on the distribution of gene lengths, and we evaluate how different parameters of a sequencing strategy impact breakpoint detection, breakpoint localization, and fusion gene detection, even in the presence of errors that suggest false rearrangements. These results will be useful in calibrating future cancer sequencing efforts, particularly large-scale studies of many cancer genomes that are enabled by next-generation sequencing technologies.

DOI10.1371/journal.pcbi.1000051
PubMed URLhttp://www.ncbi.nlm.nih.gov/pubmed/18404202?dopt=Abstract
PMCPMC2278375
Alternate JournalPLoS Comput. Biol.
PubMed ID18404202