Proteogenomic Analysis to Identify Missing Proteins from Haploid Cell Lines
MetadataVis full innførsel
Chromosome-centric Human Proteome Project aims at identifying and characterizing protein products encoded from all human protein-coding genes. As of early 2017, 19,837 protein-coding genes have been annotated in the neXtProt database including 2,691 missing proteins that have never been identified by mass spectrometry. Missing proteins may be low abundant in many cell types or expressed only in a few cell types in human body such as sperms in testis. In this study, we performed expression proteomics of two near haploid cell types such as HAP1 and KBM-7 to hunt for missing proteins. Proteomes from the two haploid cell lines were analyzed on an LTQ Orbitrap Velos, producing a total of 200 raw mass spectrometry files. After applying 1% false discovery rates at both levels of peptide-spectrum matches and proteins, more than ten thousand proteins were identified from HAP1 and KBM-7, resulting in the identification of nine missing proteins. Next, unmatched spectra were searched against protein databases translated in three frames from non-coding RNAs derived from RNA-Seq data, resulting in 6 novel protein-coding regions after careful manual inspection. This study demonstrates that expression proteomics coupled to proteogenomic analysis can be employed to identify many annotated and unannotated missing proteins.