Log-transformation, coefficient of variation, standard error and mass-spectrometry based proteomics
Entering edit mode
4.3 years ago

Hello potential heroes,

Currently I am assessing data from a mass-spectrometry experiment and I try to figure out whether my measurements are good enough for the purpose and how many runs of each sample I would need to get a reasonable estimate. To do so I wanted to calculate the coefficient of variation or standard error or something of the sorts, but here is where the following conceptual problem arose.

My measurements are originally ion-intensities and my error around these measurements is suspected to be normally distributed, this would lead me to think that standard error and coefficient of variation are good measurements for my purpose. However, my interest is in log-fold change of my samples, so I would want to log-transform the obtained sample mean intensity. Thus, I am interested in what the repercussions of my standard error are on the estimate of log-transformed mean ion-intensity.

Is it valid to take the obtained standard error of the untransformed data, use it to calculate a confidence interval on that data, and then transform these limits to the log scale to get an estimate on the precision of my log-transformed variable? While it might seem reasonable, I am afraid to do so since small deviations in the lower limit of the interval would have far more drastic consequences on the log-transformed estimate than deviations in the upper limit (logarithms asymptotically approach infinity as the intensity approaches zero). Therefore, it seems to me that the "certainty" on the lower limit is rather low and that my log-transform of the mean and its confidence interval is rather fishy. Does someone has any suggestion on how to tackle this issue? Infinite gratitude in advance!

Log-transformation mass-spectrometry • 2.6k views
Entering edit mode

The problem you are facing is that the log of a mean is not the same as the mean of logs. Same for SD. But you know, it's possible that the log scale is a natural scale for your experiment, especially since you mention intensities. In which case, you could work straight from the log-transformed values and construct your intervals from these.

Compare these two qq-plots, each plotted against normal quantiles. See which one has tails that best line up with the diagonal.

In R:

x contains your data, before log transformation

qqnorm(x); qqline(x) # qq-plot of original data
qqnorm(y); qqline(y) # qq-plot of log-transformed data

If the second graph lines up with the diagonal better, then taking the log is a natural choice. Even if the former is slightly better I would still work from the log scale.

Entering edit mode

Thanks for your answer, with regard to the biological replicates, I can easily do it in this way, and there indeed log-transformation seems appropiate. However, my technical variation i.e. variation between measured intensities between sequential runs, do not seem to be this way. (Although this is hard to judge since I only made three technical replicates, however, it does not seem logical for measurement error to be skewed, so this does not seem a reasonable assumption and leads to strongly deflated estimates of the coefficient of variation on my measurement)

Entering edit mode
6 months ago
marta.avamo ▴ 10

It's been a while since this post, but I hope my answer might currently help others with the same problem. Check this paper:

Canchola, J. A. Correct Use of Percent Coefficient of Variation (%CV) Formula for Log-Transformed Data. MOJ Proteom. Bioinform. 6, (2017).


Login before adding your answer.

Traffic: 1154 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6