Question

Log-transformation of small values with good fold change and P values

0

Entering edit mode

24 months ago

RB_Immuno • 0

Heyall

I have a set of data of serum protein expression from a 45-plex assay.

I started by doing a log transformation of the data before calculation the p-value with a T-test and the log2 fold change. My problem comes from a few proteins that are low (<1) in the control group, but >1 in most of the Patient group. This leads to the log-transformed values in the Ctrl being negative, but positive in the Pt group, meaning my fold change is negative when it should actually be positive. To change that, I change the test = t.test(log(x1),log(x2)) for test = t.test(log(x1+1),log(x2+1)). The absolute values of the fold change remained the same, which solved by issue. However, I noticed most of my pvalues were no longer the same, which meant I did not have exactly the same set of significantly different proteins for each group.

My question is, which Pvalues should I consider the good ones? The original ones from the log-transformed and instead manually change to the absolute values of the few ones (and adding a few potential human error there), or the second set of the log-transformed+1?

Thanks! R

out=data.frame()
for(i in 28:ncol(df1)){
    x1=as.numeric(df1[which(df1[,gr]==vals[1]),i])
    x2=as.numeric(df1[which(df1[,gr]==vals[2]),i])   
    test = t.test(log(x1),log(x2))
    pval = test$p.value
    est = test$estimate
    est = log2(
        (test$estimate[1])/(test$estimate[2])
                )
    if(first) {
        out = rbind(out,c(colnames(df1)[i],pval,est))
    } else {
        out = rbind(out,c(pval,est))
    }

Log-transformation Fold-Change • 1.3k views

ADD COMMENT • link updated 24 months ago by ATpoint 88k • written 24 months ago by RB_Immuno • 0

0

Entering edit mode

What is the range of the data? I wonder if you can't just feed them into limma and be done with it.

ADD REPLY • link 24 months ago by ATpoint 88k

0

Entering edit mode

From my 45-plex assay, proteins are considered usually undetectable when under 0.1 (and taken out of this analysis when undetectable in both groups), but for the others 40ish proteins, some range between 10-30, some 75-100, some 500-2000. It highly depends on the protein abundance.

ADD REPLY • link 24 months ago by RB_Immuno • 0

0

Entering edit mode

I would put that on log2 scale and try with limma.

ADD REPLY • link 24 months ago by ATpoint 88k

score 0 · Answer 1 · 2023-07-05

0

Entering edit mode

24 months ago

Zhenyu Zhang ★ 1.3k

in practice, people use the log(x+1) for statistics, they are both good p-values, but log(x) does not have much practical importance with lower values.

ADD COMMENT • link 24 months ago by Zhenyu Zhang ★ 1.3k

0

Entering edit mode

It really depends on the magnitude of the values of x as the choice of the prior count quite influences the shrinkage.

ADD REPLY • link 24 months ago by ATpoint 88k