Entering edit mode
7.1 years ago
Hydrangea
▴
10
Hi, I used DESeq2 to analysis read count data using LRT test. I have multiple covariates in the GLM model. But my MAplot and dispersion plot do not look like typical plots in the manual. Also the histogram of raw p value have a high peak on 1.
Does it indicate a problem of model fitting? Do I need to discard the low-count transcript first? Also how "good" do the MAplot and dispersion plot need to be, in order to claim a properly fitted model?
Thanks,
https://s15.postimg.org/lojmvv863/maplot.png
Did you perform any filtering to remove genes with low counts, or any other kind of quality filtering? How your MDS/ PCA look like? Describe your experiment design and analysis in more detail, please.
Anyway, Bioconductor support may be a better place for your question.
I used rowsum>0 only for speed up because DESeq2 has more restrict independent filtering. I'm using multivariate GLM model (comparing full vs reduced model with LRT test) using transcript level read count as outcome. The purpose is to find out transcript significant in LRT test. I didn't do any clustering yet.
Both of those plots look very odd. What are the scale factors?
The p-value distribution is the sort of thing one normally sees if there's an uncorrected batch effect. I suspect that a PCA plot will be informative.