DESeq data diagnosis problem (MAplot, dispersion)

1

Entering edit mode

8.4 years ago

Hydrangea ▴ 10

Hi, I used DESeq2 to analysis read count data using LRT test. I have multiple covariates in the GLM model. But my MAplot and dispersion plot do not look like typical plots in the manual. Also the histogram of raw p value have a high peak on 1.

Does it indicate a problem of model fitting? Do I need to discard the low-count transcript first? Also how "good" do the MAplot and dispersion plot need to be, in order to claim a properly fitted model?

Thanks,

https://s15.postimg.org/lojmvv863/maplot.png

https://s11.postimg.org/texjk1pkj/dispersionplot.png

https://s23.postimg.org/o7sr6nf6j/pvaluehist_Ink_LI.jpg

RNA-Seq • 2.7k views

ADD COMMENT • link updated 5.8 years ago by Kevin Blighe 89k • written 8.4 years ago by Hydrangea ▴ 10

1

Entering edit mode

Did you perform any filtering to remove genes with low counts, or any other kind of quality filtering? How your MDS/ PCA look like? Describe your experiment design and analysis in more detail, please.

Anyway, Bioconductor support may be a better place for your question.

ADD REPLY • link 8.4 years ago by h.mon 35k

0

Entering edit mode

I used rowsum>0 only for speed up because DESeq2 has more restrict independent filtering. I'm using multivariate GLM model (comparing full vs reduced model with LRT test) using transcript level read count as outcome. The purpose is to find out transcript significant in LRT test. I didn't do any clustering yet.

ADD REPLY • link 8.4 years ago by Hydrangea ▴ 10

0

Entering edit mode

Both of those plots look very odd. What are the scale factors?

ADD REPLY • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

The p-value distribution is the sort of thing one normally sees if there's an uncorrected batch effect. I suspect that a PCA plot will be informative.

ADD REPLY • link 8.4 years ago by Devon Ryan 105k

Login before adding your answer.