A check over my code using Limma DE, starting from normalized data
0
0
Entering edit mode
11 weeks ago
JACKY ▴ 10

I'm conducting a DE analysis over normalized TPM, RPKM, or FPKM data. Raw counts are not available. The data is about the clinical outcome of a certain drug. I know this is not the ideal way to do this kind of analysis, but I have no choice. This complicates things for me, I'm not getting results and I just want to make sure my code is right. Here is my code, from one of the datasets, that has TPM data:

TPM = TPM[rowSums(TPM[])>0,]
thresh = TPM > 10
keep = rowSums(thresh) >= 12
table(keep)
TPM = TPM[keep,]

# Design and Contrast
design = model.matrix(~ 0 + outcome, metadata)
colnames(design)[c(1,2)] = c('NoResponse','Response')
contrast = makeContrasts(NoResp_vs_Resp = NoResponse - Response, levels = colnames(design))

log_TPM = scale(log2(TPM + 0.1))
vfit = lmFit(log_TPM, design)
vfit = contrasts.fit(vfit, contrasts = contrast)
efit = eBayes(vfit, robust = TRUE, trend = TRUE)
plotSA(efit, main = 'final model with TPM ~ Mean-Variance trend')

summary(decideTests(efit))
DEG = topTable(efit, coef = 'NoResp_vs_Resp', p.value = 0.05, adjust.method = 'fdr', number = Inf)


I'm trying to get figures that are as close to the Bioconductors guide for limma. Can anybody please just confirm wheather my code is alright, or is there any other way I could do this? regarding filtering and stuff like that, cause I cant use edgeR::filterByExpr. Thanks!

limma r • 407 views
0
Entering edit mode

can you run normality test on log_TPM object?

0
Entering edit mode

I don't think any data of this type (RNA-seq) is normally distributed.. or is it ?

0
Entering edit mode

RNAseq data is mostly modeled after NB distribution, for biological replicates. Since data is normalized (TPM) and subsequently log transformed, it is no more original data. As TPM data is log transformed, my understanding is that values in log_TPM object are going to be normally distributed. However, I was not sure of your data. Hence, I asked you to test for normality of the data. However, as several times mentioned on biostars, raw counts are better in analysing the RNA seq data.

0
Entering edit mode

The data is not normally distributed. I tried ggqqplot(log_TPM\$sample), on many samples that I chose randomley, and got something like this:

You think you could help me or is it a gone cause ?

0
Entering edit mode

Isn't this duplicated operation?

TPM = TPM[rowSums(TPM[])>0,]
thresh = TPM > 10
keep = rowSums(thresh) >= 12
table(keep)
TPM = TPM[keep,]

0
Entering edit mode

Basically you're right. The first line is supposed to delete all zero rows. And the next lines are supposed to delete genes based on the threshold. I do both no reason at all behind it.