Clustering millions of tandem mass spectra.
|Title||Clustering millions of tandem mass spectra.|
|Publication Type||Journal Article|
|Year of Publication||2008|
|Authors||Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA|
|Journal||J Proteome Res|
|Date Published||2008 Jan|
|Keywords||Amino Acid Sequence, Cluster Analysis, Computational Biology, Molecular Sequence Data, Peptides, Proteomics, Tandem Mass Spectrometry|
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
|Alternate Journal||J. Proteome Res.|