Log-transformation of small values with good fold change and P values
1
0
Entering edit mode
24 months ago
RB_Immuno • 0

Heyall

I have a set of data of serum protein expression from a 45-plex assay.

I started by doing a log transformation of the data before calculation the p-value with a T-test and the log2 fold change. My problem comes from a few proteins that are low (<1) in the control group, but >1 in most of the Patient group. This leads to the log-transformed values in the Ctrl being negative, but positive in the Pt group, meaning my fold change is negative when it should actually be positive. To change that, I change the test = t.test(log(x1),log(x2)) for test = t.test(log(x1+1),log(x2+1)). The absolute values of the fold change remained the same, which solved by issue. However, I noticed most of my pvalues were no longer the same, which meant I did not have exactly the same set of significantly different proteins for each group.

My question is, which Pvalues should I consider the good ones? The original ones from the log-transformed and instead manually change to the absolute values of the few ones (and adding a few potential human error there), or the second set of the log-transformed+1?

Thanks! R

out=data.frame()
for(i in 28:ncol(df1)){
    x1=as.numeric(df1[which(df1[,gr]==vals[1]),i])
    x2=as.numeric(df1[which(df1[,gr]==vals[2]),i])   
    test = t.test(log(x1),log(x2))
    pval = test$p.value
    est = test$estimate
    est = log2(
        (test$estimate[1])/(test$estimate[2])
                )
    if(first) {
        out = rbind(out,c(colnames(df1)[i],pval,est))
    } else {
        out = rbind(out,c(pval,est))
    }
Log-transformation Fold-Change • 1.3k views
ADD COMMENT
0
Entering edit mode

What is the range of the data? I wonder if you can't just feed them into limma and be done with it.

ADD REPLY
0
Entering edit mode

From my 45-plex assay, proteins are considered usually undetectable when under 0.1 (and taken out of this analysis when undetectable in both groups), but for the others 40ish proteins, some range between 10-30, some 75-100, some 500-2000. It highly depends on the protein abundance.

ADD REPLY
0
Entering edit mode

I would put that on log2 scale and try with limma.

ADD REPLY
0
Entering edit mode
24 months ago
Zhenyu Zhang ★ 1.3k

in practice, people use the log(x+1) for statistics, they are both good p-values, but log(x) does not have much practical importance with lower values.

ADD COMMENT
0
Entering edit mode

It really depends on the magnitude of the values of x as the choice of the prior count quite influences the shrinkage.

ADD REPLY

Login before adding your answer.

Traffic: 2880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6