Question: Why is DESeq a better method for finding highly upregulated and downregulated genes?
2
gravatar for simonlab1
3.2 years ago by
simonlab120
European Union
simonlab120 wrote:

My question is simple. Why is DESeq analysis for RNA-Seq reads considered to be a more reliable method for identifying upregulated/downregulated genes?

rna-seq deseq chip-seq rpkm • 1.8k views
ADD COMMENTlink modified 3.2 years ago by Michael Love1.6k • written 3.2 years ago by simonlab120
16

because it has been shown that when you let an octopus decide which genes are significantly deregulated, it can not be reproduced as well as with DESeq [citation needed]

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Ido Tamir4.9k

Kudos for Octopus Joke :)

Glad that others gave answers to his question, I was also asking these kinda questions when I was new in bioinformatics

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Manvendra Singh2.0k
2

I think what Ido is trying to say is, your question is lacking a second item to compare to, 'better' with respect to what?

btw.: I like the octopus predictor: there was once an octopus with very good results in prediction https://en.wikipedia.org/wiki/Paul_the_Octopus

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Michael Dondrup44k

Fair point. Say you're comparing two different samples, and you're trying to screen for highly upregulated and highly downregulated genes. Would an RPKM ratio of Gene A in sample A and Gene A in sample B sort of analysis or a DESeq analysis that picks genes with lowest padj values be a more reliable method, and why?

Thanks!

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by simonlab120
3

If your question ends up becoming, "why should I use a complicated method like DESeq2 or edgeR rather than just doing a T-test on RPKMs?", then have a read through those papers and also the paper on limma. The rationale is described in them.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Devon Ryan85k
2

If no replicates, no method is better. the padj values does not make sense without replicates in your data. 

ADD REPLYlink written 3.2 years ago by geek_y8.7k

I think that GFold works quite fine when replicates are not there

ADD REPLYlink written 3.2 years ago by Manvendra Singh2.0k

# Sorry for Spam  #

Yes, I remember that Thomas Muller said somewhere that he wants to eat that Paul the Octopus :)

ADD REPLYlink written 3.2 years ago by Manvendra Singh2.0k

I'm so glad I don't have to hear about "Orakel Krake Paul" every night on the news any more :)

ADD REPLYlink written 3.2 years ago by Devon Ryan85k
6
gravatar for alolex
3.2 years ago by
alolex890
United States
alolex890 wrote:

I don't know if I would call it more reliable, but it does do additional calculations that other methods don't.  For one, DESeq2 does something called "shrinking" fold changes of those genes that have low read counts.  I don't pretend to understand the math behind it, but in general what it is doing is reducing the fold change of any gene that has low read counts in one or the other or both conditions.  Genes with low read counts can have exaggerated fold changes.  For example, imagine you have two conditions (each with 3 replicates).  In the control for gene A the read counts are 1, 2 and 2, (average 1.67) and in the experiment the read counts are 4, 3 and 4 (average 3.67).  Now you also have gene B with control values of 100, 200 and 200 and experimental values of 400, 300 and 400.  The calculated fold change for both genes is going to be 2.19, and they may also be significant changes according to the adjusted p-value (I've checked, it happens).  However, having a difference of 2 read counts on average is not a lot, and I would not call that differentially expressed unless it is really reproduced in a lot of replicates, thus, DESeq2 shrinks the fold change value accordingly.

I've compared DESeq2 to EdgeR, and while I like both methods, EdgeR does return many significant genes that have exaggerated fold changes due to low read counts (or zero read counts) whereas DESeq2 shrinks the fold change to where it is generally below my cutoff for differential expression.  Generally, when I filter for differential expression I use both the padj value and the fold change value.  Unless you have a lot of replicates, low fold changes may not be completely accurate.  Thus, I use DESeq2 specifically because it adjusts the fold changes of genes with low read counts.

 

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by alolex890

N.B., the fold-change shrinkage happens in DESeq2, rather than DESeq. I too often type DESeq when I mean DESeq2...force of habit :)

ADD REPLYlink written 3.2 years ago by Devon Ryan85k

Thanks!  It is an important distinction--I corrected the typos above :)

ADD REPLYlink written 3.2 years ago by alolex890

Thanks! +1. Excellent insight.

ADD REPLYlink written 3.0 years ago by SmallChess460
5
gravatar for Michael Love
3.2 years ago by
Michael Love1.6k
United States
Michael Love1.6k wrote:

It's not likely that any method is better for all experiments, and methods can be evaluated across many metrics (accuracy in estimating effect size, control of FDR, sensitivity, robust, etc.). Just a few important ways in which even your standard, bulk RNA-seq experiment can differ:

  • number of biological replicates per group
  • number of groups
  • experimental design
  • batch effects
  • amount of within-group biological variability (big difference btwn controlled experiment vs study)
  • scale of the effect sizes (big or small diffs btwn groups)
  • proportion of genes/features which show differences btwn groups
  • presence of outliers
  • ...

We like to remind users that, with very many replicates and exchangeable samples, rank tests or permutation tests are great because you don't have to make distributional assumptions. It's just that investigators often don't want to spend money on extra experiments when e.g. 3 or 5 replicates per group will suffice in finding the large effects, and allow them to examine more conditions.

With these differences in mind, I'd recommend looking for evaluations by 3rd parties.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Michael Love1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour