Question: Why different DEGs between my result and the author's result?
1
gravatar for fernardo
13 months ago by
fernardo 120
Italy
fernardo 120 wrote:

Hi all,

I did differential expression analysis on different datasets, and I noticed some unwilling outcome. E.g. in "GSE32280", the author says there are 10s - hundresd of DEGs between MDD and Health Control (used SVM not R package) to classify DEGs. And I used Limma package which came out 0 DEG.

Can someone provide own experience what to tell about such outcome? and which one should we more depend on? SVM or Limma? FYI, smallest adjusted-P from Limma was around 0.4.

Reference (which was not added in GEO database by them yet): PMC3278427

Secondly, from a study, in case the author used a commercial software to come out with 100 DEGs and if I use Limma to come out with 1000 DEGs, should I trust my instinct and result more? :)

Thank you very much for all

Edit: Code added

library(limma)
targets <- readTargets("phenotype.txt")
data <- ReadAffy(filenames=targets$FileName)
eset <- rma(data)
design <- model.matrix(~ -1+factor(targets$Targets))
colnames(design) <- c("Disease","Control")
contrast.matrix <- makeContrasts(Disease-Control, levels=design)
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
results <- topTable(fit2, coef=1, adjust="BH", sort.by="B",lfc=0, p.value=0.05, number=1000)
ADD COMMENTlink modified 13 months ago by dariober10.0k • written 13 months ago by fernardo 120
1

Could you give the reference of the paper you are referring to. Also, SVM is a classification method not a DEG test. On a general note, one could expect that for DE studies, the desired outcome is normally DEGs so authors might tend to report the method that gives them DEGs, such studies are hard to write up if there are no or little DEGs.

ADD REPLYlink modified 13 months ago • written 13 months ago by Michael Dondrup46k

Sure. I had added that up. Its pubmed id: PMC3278427:

Yi, Z., Li, Z., Yu, S., Yuan, C., Hong, W., Wang, Z., … Fang, Y. (2012). Blood-Based Gene Expression Profiles Models for Classification of Subsyndromal Symptomatic Depression and Major Depressive Disorder. PLoS ONE, 7(2), e31283. http://doi.org/10.1371/journal.pone.0031283

In the paper, they mentioned as Differentially Expressed Signature.

ADD REPLYlink modified 13 months ago • written 13 months ago by fernardo 120
1
gravatar for h.mon
13 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Different methods, different outcomes, no surprise on that. Have you never read any method comparison paper?

and which one should we more depend on? SVM or Limma?

The one you validate with further experiments or independent data.

ADD COMMENTlink written 13 months ago by h.mon24k

Different methods give different outcomes but how different? hundren DEGs vs None? 1000 vs 3000? something is not right here.

ADD REPLYlink written 13 months ago by fernardo 120
1

We don't know how exactly you performed the analysis (no code shown), so we can't be sure if it is correct or not.

Likewise, we don't know exactly how the authors you mentioned performed their analyses, so we can't be sure if it is correct or not. Supposedly, published manuscripts should have a higher degree of quality because they have been peer-reviewed. (Though we know that not necessarily true. In fact, statistically significant results tend to be published more often than non-significant ones.)

By the description of the experiment you mentioned (microarrays of leucocytes comparing control, subsyndromal symptomatic depression and major depressive disorder individuals), I would expect a lot of noise and weak signal, thus different methods may well be markedly different from one another.

In fact, when the authors tried to validate some of their findings, they couldn't for most of the genes.

ADD REPLYlink written 13 months ago by h.mon24k

Thanks. By the way, I just added the code I used.

ADD REPLYlink written 13 months ago by fernardo 120
1
gravatar for grant.hovhannisyan
13 months ago by
grant.hovhannisyan1.5k wrote:

As mentioned above, usually you get different results with different software, especially if they use different models/paradigms. However, I believe that really strong biological effect backed-up with good experimental data should be detected by most of the software. For example, in case of DE analysis, if fold-change is big and expression levels are high, I think any software will catch the gene as deferentially expressed. However the differences between software will pop-up on genes with low expression values and low fold-changes.

ADD COMMENTlink modified 13 months ago • written 13 months ago by grant.hovhannisyan1.5k

Exactly, that is what I also mean.

ADD REPLYlink written 13 months ago by fernardo 120
1
gravatar for dariober
13 months ago by
dariober10.0k
WCIP | Glasgow | UK
dariober10.0k wrote:

If you have access to the table of DEGs from the authors, have a look at their most differentially expressed genes (ranked by, say, adjusted p-value). Then see what your limma output reports for those genes. For example, do to they rank on top and what's their p-value?

Also, check whether the expression values for the individual samples in the normalized expression matrix "makes sense", i.e. do these genes look credible?

This is by no means a conclusive approach to answer your question but you should get an idea of whether something is clearly wrong either in yours or their analysis.

ADD COMMENTlink written 13 months ago by dariober10.0k

Well they haven't used Limma but still they provided a table in their paper with "FC (adjusted p <0.0005)" which is clearly the fold change of the genes with that adjusted-p. Mine, had p-value <0.05 but adjusted-p >0.4.

ADD REPLYlink written 13 months ago by fernardo 120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1473 users visited in the last hour