Question

DESeq2 - "all gene-wise dispersion estimates are within 2 orders of magnitude"

1

Entering edit mode

8.8 years ago

scchess ▴ 640

I've simulated RNA abundance with wgsim. The simulation itself was error free. There is a single factor in my experiment that looks like:

           A1     A2     A3     B1     B2     B3
R1_101    113    113    113     13     11      9
R1_102    247    246    246     12     12     14
R1_103  20835  20915  20788   9973   9955   9973

A1, A2, A3 are the simulated replicates for the first level. B1, B2 and B3 are the simulated replicated for the second level.

As expected, the reads counts for each level are very close because it was an error-free simulation. The purpose of the experiment is to compare it with cuffdiffs (another differential package) in detecting log-fold changes.

Unfortunately, I run into an error in DESq2:

Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) : 
    all gene-wise dispersion estimates are within 2 orders of magnitude

It looks like the package is unable to estimate a dispersion factor (most likely it's too small). However, I had no problem with cuffdiffs. Is there anything that I can do to make it work?

deseq2 RNA-Seq R • 12k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by scchess ▴ 640

2

Entering edit mode

is there a specific reason why you simulated the replicates with so little variance? to me it doesn't seem realistic at all and that's why deseq2 is complaining. if you want to compare cuffdiff with deseq2, you should use data that is closer to real rna-seq. maybe you can use dispersion values from a sequencing experiment. this is for example done in compcodeR.

ADD REPLY • link 8.8 years ago by Martombo ★ 3.1k

0

Entering edit mode

No particular reason. The purpose of my experiment is to compare log-fold changes. For example, if a gene has a log-fold ratio of 10. How would cuffdiff and DESeq2 react to it?

ADD REPLY • link 8.8 years ago by scchess ▴ 640

1

Entering edit mode

in my experience (I also performed a few such simulations), nothing beats the fold change estimates of deseq2. this is because of the moderation applied to lowly expressed genes. if you want to try it yourself, I'd suggest you to take a look at compcodeR. it's a very handy package for DEA comparisons.

ADD REPLY • link 8.8 years ago by Martombo ★ 3.1k

0

Entering edit mode

I got this error when I forgot to normalize the size factors such that they are around 1.

ADD REPLY • link 3.4 years ago by j.gleixner ▴ 20

0

Entering edit mode

Thank you for your reply.

I run the lines below but the result got no DEG, why? My data is real with three replicates.

>rlog <- rlogTransformation(dds)
>mycount_rlog=assay(rlog)
>mycount_rlog[mycount_rlog < 0] <- 0
>dds_rlog <- DESeqDataSetFromMatrix(countData=round(mycount_rlog),
                              colData=colData, 
                              design= ~ condition_set)
>dds_r <- estimateSizeFactors(dds_rlog)
> dds_r$sizeFactor
S12 S13 S11 S23 S22 S21
  1   1   1   1   1   1
>dds_r <- estimateDispersionsGeneEst(dds_r)
>dispersions(dds_r) <- mcols(dds_r)$dispGeneEst
>dds_r <- nbinomWaldTest(dds_r)
>res_r = results(dds_r, contrast=c("condition_set", "S1", "S2"), alpha=0.05, pAdjustMethod="BH")
>summary(res_r)
out of 15731 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

ADD REPLY • link updated 16 months ago by Ram 43k • written 17 months ago by yinchangfa • 0

0

Entering edit mode

Please open a new question, and read the DESeq2 manual. At no point it instructs to use the rlog as input for DESeq2, use raw counts, rlog is wrong and violates assumptions, in part because it is already on log scale. The entire code you use is custom, be sure to follow the manual precisely.

ADD REPLY • link 17 months ago by ATpoint 82k

Ram · Accepted Answer · 2015-07-02

The DESeq2 method has empirical Bayes parts to it, which involve sharing information across genes to improve estimates (see the paper). In this case, during dispersion estimation, we look across the genes at the distribution of gene-wise dispersion estimates to improve the final estimates (posterior modes). If you simulate data which has no (over)dispersion, then these methods don't make sense.

Did you clip the warning message which was printed in the console for some reason? Because the rest of it tells you what to do:

# all gene-wise dispersion estimates are within 2 orders of magnitude
# from the minimum value, and so the standard curve fitting techniques will not work.
# One can instead use the gene-wise estimates as final estimates:
dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst