Clustering millions of tandem mass spectra.

TitleClustering millions of tandem mass spectra.
Publication TypeJournal Article
Year of Publication2008
AuthorsFrank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA
JournalJ Proteome Res
Volume7
Issue1
Pagination113-22
Date Published2008 Jan
ISSN1535-3893
KeywordsAmino Acid Sequence, Cluster Analysis, Computational Biology, Molecular Sequence Data, Peptides, Proteomics, Tandem Mass Spectrometry
Abstract

Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.

DOI10.1021/pr070361e
PubMed URLhttp://www.ncbi.nlm.nih.gov/pubmed/18067247?dopt=Abstract
PMCPMC2533155
Alternate TitleJ. Proteome Res.
PubMed ID18067247