For label-free shotgun proteomics relative quantification of proteins/peptides can be done either through spectral counting or intensity based methods. I was given a list of raw spectral counts (SpCs) by a technician and have been tasked with analysis (I am new to the proteomics field, coming from transcriptomics) and now I need to do some pre-processing prior to downstream analysis.
Common pre-processing tasks include
log2-transformation to render the intensities more symmetric, and
normalization to reduce systematic technical variation while retaining the underlying biological signal (Goeminne et al., 2017).
I tried the tool MSqRob from Goeminne et al., 2017 to do the
quantile normalization; however, I run into problems since the authors vignette example loads their sample data (peptides.txt) using the
system.file command and it wasn’t clear to me how one should load their own data [For example, I tried read.table and read.csv commands using a copy of peptides.txt I save in my working directory but when I call
peptidesFranc <- read_MaxQuant(file_peptides_txt, pattern="Intensity ", remove_pattern=TRUE) I receive errors]. I contacted the authors last week but have not received any help so I thought I'd ask here and circulate on twitter).
Another tool I found called Crux supports four types of
quantification (log2-transformation is applied before any one of these methods I believe?): Normalized Spectral Abundance Factor (NSAF), Distributed Normalized Spectral Abundance (dNSAF), Normalized Spectral Index (SIN) and Exponentially Modified Protein Abundance Index (emPAI). Let's say I wanted to do NSAF for example. NSAF is defined as follows: (NSAF)k = (SpC/Length)/ΣNi=1 (SpC/Length)i, where “SpC” represents spectral counts, “Length” represents the length of protein, and “N” represents the total number of proteins.
You would think one could just load SpCs along with protein lengths into
crux and get NSAF; however the input for crux is a collection of scored peptide spectra matches (PSMs).
The help I need from my biostars colleagues is to suggest a package I can use to
normalize raw spectral counts (here is a sample -not my data- of what spectral counts look like).
Perhaps I'm approaching this the wrong way. Maybe I DO need to input PSMs (which I believe are from MS2 files?) to get normalized SpCs. This is a file I was never given by the LC-MS technician but maybe they didn't know I needed it?
I guess the third option is to write my own in-house script to do log2-transformation and normalization but I don't see the point in re-inventing the wheel - there HAS to be a package out there somewhere?
Your input is greatly appreciated! Thank you