Why different DEGs between my result and the author's result?
3
1
Entering edit mode
3.1 years ago
fernardo ▴ 150

Hi all,

I did differential expression analysis on different datasets, and I noticed some unwilling outcome. E.g. in "GSE32280", the author says there are 10s - hundresd of DEGs between MDD and Health Control (used SVM not R package) to classify DEGs. And I used Limma package which came out 0 DEG.

Can someone provide own experience what to tell about such outcome? and which one should we more depend on? SVM or Limma? FYI, smallest adjusted-P from Limma was around 0.4.

Reference (which was not added in GEO database by them yet): PMC3278427

Secondly, from a study, in case the author used a commercial software to come out with 100 DEGs and if I use Limma to come out with 1000 DEGs, should I trust my instinct and result more? :)

Thank you very much for all

library(limma)
data <- ReadAffy(filenames=targets$FileName) eset <- rma(data) design <- model.matrix(~ -1+factor(targets$Targets))
colnames(design) <- c("Disease","Control")
contrast.matrix <- makeContrasts(Disease-Control, levels=design)
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
results <- topTable(fit2, coef=1, adjust="BH", sort.by="B",lfc=0, p.value=0.05, number=1000)

microarray limma differential gene expression • 1.3k views
1
Entering edit mode

Could you give the reference of the paper you are referring to. Also, SVM is a classification method not a DEG test. On a general note, one could expect that for DE studies, the desired outcome is normally DEGs so authors might tend to report the method that gives them DEGs, such studies are hard to write up if there are no or little DEGs.

0
Entering edit mode

Yi, Z., Li, Z., Yu, S., Yuan, C., Hong, W., Wang, Z., … Fang, Y. (2012). Blood-Based Gene Expression Profiles Models for Classification of Subsyndromal Symptomatic Depression and Major Depressive Disorder. PLoS ONE, 7(2), e31283. http://doi.org/10.1371/journal.pone.0031283

In the paper, they mentioned as Differentially Expressed Signature.

1
Entering edit mode
3.1 years ago
h.mon 32k

Different methods, different outcomes, no surprise on that. Have you never read any method comparison paper?

and which one should we more depend on? SVM or Limma?

The one you validate with further experiments or independent data.

0
Entering edit mode

Different methods give different outcomes but how different? hundren DEGs vs None? 1000 vs 3000? something is not right here.

1
Entering edit mode

We don't know how exactly you performed the analysis (no code shown), so we can't be sure if it is correct or not.

Likewise, we don't know exactly how the authors you mentioned performed their analyses, so we can't be sure if it is correct or not. Supposedly, published manuscripts should have a higher degree of quality because they have been peer-reviewed. (Though we know that not necessarily true. In fact, statistically significant results tend to be published more often than non-significant ones.)

By the description of the experiment you mentioned (microarrays of leucocytes comparing control, subsyndromal symptomatic depression and major depressive disorder individuals), I would expect a lot of noise and weak signal, thus different methods may well be markedly different from one another.

In fact, when the authors tried to validate some of their findings, they couldn't for most of the genes.

0
Entering edit mode

Thanks. By the way, I just added the code I used.

1
Entering edit mode
3.1 years ago

As mentioned above, usually you get different results with different software, especially if they use different models/paradigms. However, I believe that really strong biological effect backed-up with good experimental data should be detected by most of the software. For example, in case of DE analysis, if fold-change is big and expression levels are high, I think any software will catch the gene as deferentially expressed. However the differences between software will pop-up on genes with low expression values and low fold-changes.

0
Entering edit mode

Exactly, that is what I also mean.

1
Entering edit mode
3.1 years ago

If you have access to the table of DEGs from the authors, have a look at their most differentially expressed genes (ranked by, say, adjusted p-value). Then see what your limma output reports for those genes. For example, do to they rank on top and what's their p-value?

Also, check whether the expression values for the individual samples in the normalized expression matrix "makes sense", i.e. do these genes look credible?

This is by no means a conclusive approach to answer your question but you should get an idea of whether something is clearly wrong either in yours or their analysis.

0
Entering edit mode

Well they haven't used Limma but still they provided a table in their paper with "FC (adjusted p <0.0005)" which is clearly the fold change of the genes with that adjusted-p. Mine, had p-value <0.05 but adjusted-p >0.4.