Question

Log-transformation, coefficient of variation, standard error and mass-spectrometry based proteomics

2

Entering edit mode

4.5 years ago

jobbe.goossens ▴ 20

Hello potential heroes,

Currently I am assessing data from a mass-spectrometry experiment and I try to figure out whether my measurements are good enough for the purpose and how many runs of each sample I would need to get a reasonable estimate. To do so I wanted to calculate the coefficient of variation or standard error or something of the sorts, but here is where the following conceptual problem arose.

My measurements are originally ion-intensities and my error around these measurements is suspected to be normally distributed, this would lead me to think that standard error and coefficient of variation are good measurements for my purpose. However, my interest is in log-fold change of my samples, so I would want to log-transform the obtained sample mean intensity. Thus, I am interested in what the repercussions of my standard error are on the estimate of log-transformed mean ion-intensity.

Is it valid to take the obtained standard error of the untransformed data, use it to calculate a confidence interval on that data, and then transform these limits to the log scale to get an estimate on the precision of my log-transformed variable? While it might seem reasonable, I am afraid to do so since small deviations in the lower limit of the interval would have far more drastic consequences on the log-transformed estimate than deviations in the upper limit (logarithms asymptotically approach infinity as the intensity approaches zero). Therefore, it seems to me that the "certainty" on the lower limit is rather low and that my log-transform of the mean and its confidence interval is rather fishy. Does someone has any suggestion on how to tackle this issue? Infinite gratitude in advance!

Log-transformation mass-spectrometry • 2.7k views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 4.5 years ago by jobbe.goossens ▴ 20

0

Entering edit mode

The problem you are facing is that the log of a mean is not the same as the mean of logs. Same for SD. But you know, it's possible that the log scale is a natural scale for your experiment, especially since you mention intensities. In which case, you could work straight from the log-transformed values and construct your intervals from these.

Compare these two qq-plots, each plotted against normal quantiles. See which one has tails that best line up with the diagonal.

In R:

x contains your data, before log transformation

y<-log(x)
qqnorm(x); qqline(x) # qq-plot of original data
qqnorm(y); qqline(y) # qq-plot of log-transformed data

If the second graph lines up with the diagonal better, then taking the log is a natural choice. Even if the former is slightly better I would still work from the log scale.

ADD REPLY • link updated 8 months ago by Ram 44k • written 4.5 years ago by Lemire ▴ 940

0

Entering edit mode

Thanks for your answer, with regard to the biological replicates, I can easily do it in this way, and there indeed log-transformation seems appropiate. However, my technical variation i.e. variation between measured intensities between sequential runs, do not seem to be this way. (Although this is hard to judge since I only made three technical replicates, however, it does not seem logical for measurement error to be skewed, so this does not seem a reasonable assumption and leads to strongly deflated estimates of the coefficient of variation on my measurement)

ADD REPLY • link 4.3 years ago by jobbe.goossens ▴ 20

Ram · Answer 1 · 2023-12-12

1

Entering edit mode

8 months ago

marta.avamo ▴ 10

It's been a while since this post, but I hope my answer might currently help others with the same problem. Check this paper:

Canchola, J. A. Correct Use of Percent Coefficient of Variation (%CV) Formula for Log-Transformed Data. MOJ Proteom. Bioinform. 6, (2017).

ADD COMMENT • link updated 8 months ago by Ram 44k • written 8 months ago by marta.avamo ▴ 10