Question: RNASeq: Differential gene expression and qqplot
3
gravatar for rmf
19 months ago by
rmf730
rmf730 wrote:

I ran DGE on my dataset using DESeq2 and Limma. I used the same dataset, similar settings and same GLM model (~family+condition). The number of DEGs I get are as below (based on unadjusted p-values <0.05 and BH adjusted q-values <0.05 shown separately):

Tool   num_degs_pval  num_degs_qval
DESeq2 3155           623
Limma  2853           32

I plotted qqplots using unadjusted pvalues from both DESeq2 and Limma results.

library(snpStats)
qq.chisq(-2*log(deseq2_results$pvalue),df=2,pvals=T,overdisp=T)
qq.chisq(-2*log(limma_results$pvalue),df=2,pvals=T,overdisp=T)

This is what I get:

enter image description here

Now questions;

  • Is the qqplot useful in RNA-Seq to estimate DGE quality?
  • In this example, does the better fit of Limma mean that the fewer set of DEGs returned by Limma is more reliable?
  • Similarly, does the inflated curve for DESeq2 mean that the huge number of DEGs returned by DESeq2 contains more false positives?

Couple of caveats: The function I used qq.chisq is for chisq tests and I am not sure if that is an issue since my Deseq2 script uses Wald test and Limma uses bayesian eBayes test. The df argument in qq.chisq is set to 2 arbitrarily. I am not sure what value I should plug in here.

ADD COMMENTlink modified 19 months ago • written 19 months ago by rmf730
0
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

Interesting findings but one would need to know your sample size before making any further comments. Wait, is this that family study where we were previously debating over the inclusion/exclusion of PCA eigenvectors as covariates? In that case, I have a rough idea of the sample size.

While the limma QQ plot looks a lot more convincing, I am not confident that the same QQ Chi-squared measure of goodness of fit can be applied equally to both sets of P values. DESeq2 derives P values by applying the Wald test to its negative binomial model of raw/normalised counts (as you implied), while Limma goes by quantile normalisation, which follows a binomial.

You may wish to retry this by generating the QQ plot differently for each based on the underlying distribution. Apparently you can do this with this R function: https://www.rdocumentation.org/packages/qualityTools/versions/1.55/topics/qqPlot

y, character string specifying the distribution of x. The function qqPlot will support the following character strings for y:

 - “beta”
 - “cauchy”
 - “chi-squared”
 - “exponential”
 - “f”
 - “gamma”
 - “geometric”
 - “log-normal”
 - “lognormal”
 - “logistic”
 - “negative binomial”
 - “normal”
 - “Poisson”
 - “t”
 - “weibull”

You may also be interested in the work of these guys (see Figure 3): A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments

ADD COMMENTlink modified 8 months ago • written 19 months ago by Kevin Blighe49k

Thanks for the input. The sample size is same as before: 24.

ADD REPLYlink written 19 months ago by rmf730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour