Heyall
I have a set of data of serum protein expression from a 45-plex assay.
I started by doing a log transformation of the data before calculation the p-value with a T-test and the log2 fold change.
My problem comes from a few proteins that are low (<1) in the control group, but >1 in most of the Patient group. This leads to the log-transformed values in the Ctrl being negative, but positive in the Pt group, meaning my fold change is negative when it should actually be positive.
To change that, I change the test = t.test(log(x1),log(x2))
for test = t.test(log(x1+1),log(x2+1))
. The absolute values of the fold change remained the same, which solved by issue. However, I noticed most of my pvalues were no longer the same, which meant I did not have exactly the same set of significantly different proteins for each group.
My question is, which Pvalues should I consider the good ones? The original ones from the log-transformed and instead manually change to the absolute values of the few ones (and adding a few potential human error there), or the second set of the log-transformed+1?
Thanks! R
out=data.frame()
for(i in 28:ncol(df1)){
x1=as.numeric(df1[which(df1[,gr]==vals[1]),i])
x2=as.numeric(df1[which(df1[,gr]==vals[2]),i])
test = t.test(log(x1),log(x2))
pval = test$p.value
est = test$estimate
est = log2(
(test$estimate[1])/(test$estimate[2])
)
if(first) {
out = rbind(out,c(colnames(df1)[i],pval,est))
} else {
out = rbind(out,c(pval,est))
}
What is the range of the data? I wonder if you can't just feed them into limma and be done with it.
From my 45-plex assay, proteins are considered usually undetectable when under 0.1 (and taken out of this analysis when undetectable in both groups), but for the others 40ish proteins, some range between 10-30, some 75-100, some 500-2000. It highly depends on the protein abundance.
I would put that on log2 scale and try with limma.