Question: DESeq2 - "all gene-wise dispersion estimates are within 2 orders of magnitude"
0
gravatar for student-t
2.8 years ago by
student-t430
Australia
student-t430 wrote:

I've simulated RNA abundance with wgsim. The simulation itself was error free. There is a single factor in my experiment that looks like:

           A1     A2     A3     B1     B2     B3
R1_101    113    113    113     13     11      9
R1_102    247    246    246     12     12     14
R1_103  20835  20915  20788   9973   9955   9973

A1, A2, A3 are the simulated replicates for the first level. B1, B2 and B3 are the simulated replicated for the second level.

As expected, the reads counts for each level are very close because it was an error-free simulation. The purpose of the experiment is to compare it with cuffdiffs (another differential package) in detecting log-fold changes.

Unfortunately, I run into an error in DESq2:

    Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) : 
        all gene-wise dispersion estimates are within 2 orders of magnitude

It looks like the package is unable to estimate a dispersion factor (most likely it's too small). However, I had no problem with cuffdiffs. Is there anything that I can do to make it work?

 

biostats rna-seq rna deseq2 R • 1.8k views
ADD COMMENTlink modified 2.8 years ago by Michael Love1.6k • written 2.8 years ago by student-t430
1

is there a specific reason why you simulated the replicates with so little variance? to me it doesn't seem realistic at all and that's why deseq2 is complaining. if you want to compare cuffdiff with deseq2, you should use data that is closer to real rna-seq. maybe you can use dispersion values from a sequencing experiment. this is for example done in compcodeR.

ADD REPLYlink written 2.8 years ago by Martombo2.0k

No particular reason. The purpose of my experiment is to compare log-fold changes. For example, if a gene has a log-fold ratio of 10. How would cuffdiff and DESeq2 react to it?

ADD REPLYlink written 2.8 years ago by student-t430
1

in my experience (I also performed a few such simulations), nothing beats the fold change estimates of deseq2. this is because of the moderation applied to lowly expressed genes. if you want to try it yourself, I'd suggest you to take a look at compcodeR. it's a very handy package for DEA comparisons.

ADD REPLYlink written 2.8 years ago by Martombo2.0k
3
gravatar for Michael Love
2.8 years ago by
Michael Love1.6k
United States
Michael Love1.6k wrote:

The DESeq2 method has empirical Bayes parts to it, which involve sharing information across genes to improve estimates (see the paper). In this case, during dispersion estimation, we look across the genes at the distribution of gene-wise dispersion estimates to improve the final estimates (posterior modes). If you simulate data which has no (over)dispersion, then these methods don't make sense. 

Did you clip the warning message which was printed in the console for some reason? Because the rest of it tells you what to do:

all gene-wise dispersion estimates are within 2 orders of magnitude
from the minimum value, and so the standard curve fitting techniques will not work.
One can instead use the gene-wise estimates as final estimates:
dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Michael Love1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1935 users visited in the last hour