US HUPO: Proteomics Informatics (2010)
Sunday, March 7, 2010, 9:00 am - 4:00 pm (with one hour break for lunch-on-your-own)
Nathan Edwards (Georgetown) and Martin McIntosh (Fred Hutchinson Cancer Research Center)
This class will cover introductory and selected advanced informatics and data analysis topics related to tandem mass spectrometry proteomics. The class is intended for the applied laboratory or computational researchers in that it is intended to provide basic and practical insight into a variety of topics, including: search engines, protein sequence databases, statistical significance for peptide and protein identification, and quantitative proteomics.
Advanced topics include combining and refining results of multiple search engines, and refinement, evaluating statistical significance of quantitative experiments using pathway or gene-set style analyses adapted from genomics. Case studies borrowed from the experience of the instructors will be used to demonstrate the basic principles. This will not include a survey of tools nor emphasis on any specific workflow, but rather instructors will focus on general ideas useful for multiple strategies but provide specific examples using a variety of tools familiar to the instructors.
Basics I: Search engines and protein/peptide inference MS/MS Search
Engines: Framework, understanding when the do and do not work.
Protein Sequence Databases: Origins, protein families, redundancy & isoforms.
Peptide and protein inference: P-values, E-values and decoy searching.
Case Study: Searching genomic sequence evidence
Basics II: Quantitative analysis Label free and labeled analysis:
Strengths and weaknesses of each.
Design: pools versus individual level, selecting among various strategies.
Case studies: two SILAC experiments with very different analysis strategies.
Advanced topic I: Combining & Refining Database Searches. Increasing sensitivity of scores, calibration, and other topics of combining and refining search engine results.
Advanced topic II: Analyzing proteomics results using basic tools borrowed from genomics, including gene-set enrichment and other pathway analyses, in order to increase the statistical power and biological relevance.
- Tandem Mass Spectrometry Search Engines.
Additional reading: Protein Identification from Tandem Mass Spectra by Sequence Database Search. Bioinformatics for Comparative Proteomics, Methods in Molecular Biology. Preprint.
- Protein Sequence Databases.
- Peptide Identification Result Combining.
- Protein Identification by Peptide Mass Fingerprint.
- Protein Identification by Sequence Database Search with Tandem Mass Spectra. Download spectra.
- Protein Identification using a Sequence Tag.