Question: Quantitative Proteomics Statistics
gravatar for mgrcprof
14 days ago by
mgrcprof20 wrote:


I have received a dataset from an old proteomic experiment which contain the SILAC ratio for 3834 proteins for a determined condition. I'm used to use statistical metrics as the p.value associated to a T-test to establish a cut-off for the consequent over-representation analysis, however, this ratio is not accompanied by any type of statistic. I got 3 technihcal replicants.

I'm wondering if exists a method to establish a statistic to this collection of ratios and the librarie/package/software to make it. References would be appreciated

Thank you!

statistics proteomics silac • 133 views
ADD COMMENTlink modified 14 days ago • written 14 days ago by mgrcprof20
gravatar for Jean-Karim Heriche
14 days ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche12k wrote:

If the ratios are not already expressed as logs, then log-transform them then check how the values are distributed. If they are roughly Gaussian-shaped, you could use a t-test. If you can't assume normality then you can do a permutation test. In any case, don't forget to correct for multiple testing. You may also want to have a look at the RforProteomics package.

ADD COMMENTlink written 14 days ago by Jean-Karim Heriche12k
gravatar for mgrcprof
14 days ago by
mgrcprof20 wrote:

Thank you for your answer Jean-Karim!

I have been taking a look at R for Proteomics Vignette, however i have a problem. This package is designed for the analysis and identification of raw pep files emerging of the analysis. I have a final list of proteins with their ratios. I have transformed them to the log2 scale and they are roughly Gaussian-shaped.

How can i perform the t-test analysis on the data? I'm really lost at this point. I have used the r-base function (t.test) to compare means in other datasets where i have a measure per sample, but never made it with a single list of ratios.

ADD COMMENTlink written 14 days ago by mgrcprof20

Please use the "add comment" button when replyin to an answer, this keeps the discussion organized.

I assume that the ratio is between two conditions, something like treatment over control. In this case, what you want to test is whether there is a difference in expression between the two conditions which translates into the ratio being significantly different from 1. In the log-transformed space, you then test the null hypothesis that the value is 0. If the mean of your log-transformed data is not 0, you would need to center the data before doing the test.

ADD REPLYlink written 14 days ago by Jean-Karim Heriche12k

Yes, that's exactly what they are, ratios between two conditions. If the data were succesfully centered at 0 i would get the following output from r-base t.test function, no?:

t.test(data$mean) One Sample t-test

data: data$mean

t = 0.25743, df = 3833, p-value = 0.7969

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

-0.01591990 0.02073247

sample estimates:

mean of x


So once i have checked that they are succesfully centered at 0 and following a Gaussian distribution, how should i continue to obtain a statistic to get those values that represent a significant change in this distribution? Simply by getting the extreme 2,5% of ratio values?

ADD REPLYlink modified 11 days ago • written 11 days ago by mgrcprof20

If you want to formally test if your data is normally distributed, do a Shapiro-Wilks normality test (shapiro.test()), don't do a t-test. To select proteins with significant change, you test each protein using the replicates (i.e. test if the mean of the replicates is equal to 0). However, with only three replicates and correction for multiple testing, this approach may not have enough power. However, I believe statistics are not the answer to your problem here. I would select proteins whose median over replicates is above a given threshold. Using the median enforces reproducibility, i.e. at least half the replicates will be above threshold. Use prior knowledge to find a biologically-relevant threshold. For example, if key players in the process you're interested in are known to change, you could use this to select the threshold. Or if key players are known but not their change, you could rank the proteins based on fold change and look at how many of these known players you recover at different thresholds.

ADD REPLYlink written 11 days ago by Jean-Karim Heriche12k

Okey, i will use a shapiro test!

However, with only three replicates and correction for multiple testing, this approach may not have enough power.

Effectively, after using the t.test as you specified, the number of proteins with a p-val lower than 0.05 is 187, transforming into 0 when applying the FDR correction.... The prior-knowledge approximation sounds so good for this situation, because i have previously experimental evidence of proteins that change under the condition studied. I will try it.

Thank you again, your wisdom is appreciated!

ADD REPLYlink written 11 days ago by mgrcprof20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1531 users visited in the last hour