CPTAC offers a nice Python package that makes it easy to download protein quantification data and work with it directly as a pandas Data Frame. The structure of this data is similar to bulk RNA-seq data but represents protein quantification instead of transcriptomic data. However, the main issue is that the data provided in the package is already normalized.
Unfortunately, I couldn’t find any information about the type of normalization applied to the data. I searched everywhere, but there’s no clear documentation on this. Additionally, the CPTAC data available on the Proteome Data Commons (PDC)—which hosts all CPTAC datasets—is limited to raw mass spectra files. I don’t know how to use or analyze raw spectra, and I was hoping to find unnormalized quantification data in a format similar to what the Python package provides.
1 - Does anyone know how I can access such unnormalized protein quantification data? or at least what did they do to the data in the package exactly, how was it processed ? Any guidance would be greatly appreciated!
2 - How is the quantification done ? the numbers we see in the data frame are Mass-to-charge ratio or what exactly ?