Question

Choosing between edgeR's exactTest and QLFTest for 2x2 design

0

Entering edit mode

4.5 years ago

adam.faranda ▴ 110

I'm trying to decide whether to use edgeR's QLFTest or ExactTest in a 2 x 2 experimental design. I'm finding that while the two methods largely agree, there are distinct subsets of genes in my data that are only detected by one method or the other.

My experiment includes two genotypes (WT and B8) and two time intervals (0, 24 Hours). Each condition has three replicates. I have several questions I'd like to answer with this data:

Which genes have a significant difference between 0 Hours and 24 Hours? in WT samples; which genes have a significant difference between 0 and 24 hours in B8 samples?
Which genes have a significant difference at 0 Hours between WT and B8; which at 24 Hours?
Which genes with a significant change between 0 and 24 Hours in WT remain unchanged in B8?

For the Quasi Likelihood I'm using the following design matrix:

 design
                  WT_0H WT_24H B8_0H B8_24H
    B8_0_hr_1_S9      0      0     1      0
    B8_0_hr_3_S10     0      0     1      0
    B8_0_hr_4_S11     0      0     1      0
    B8_24hr_1_S12     0      0     0      1
    B8_24hr_2_S13     0      0     0      1
    B8_24hr_3_S14     0      0     0      1
    WT_0_hr_1_S1      1      0     0      0
    WT_0_hr_2_S2      1      0     0      0
    WT_0_hr_3_S3      1      0     0      0
    WT_24_hr_1_S6     0      1     0      0
    WT_24_hr_2_S7     0      1     0      0
    WT_24_hr_3_S8     0      1     0      0
    attr(,"assign")
    [1] 1 1 1 1
    attr(,"contrasts")
    attr(,"contrasts")$group
    [1] "contr.treatment"

And contrast matrix:

        Contrasts
Levels   Hour_0 Hour_24 WT B8
  WT_0H      -1       0 -1  0
  WT_24H      0      -1  1  0
  B8_0H       1       0  0 -1
  B8_24H      0       1  0  1

And estimating differential expression with the following call:

dge<-calcNormFactors(dge)
dge<-estimateDisp(dge, desigDEn)
qlf<-glmQLFit(dge, design)
qlt<-glmQLFTest(qlf, contrast = cmat)
deg<-as.data.frame(topTags(qlt, n=Inf))
degQ<-data.table(deg)

For the Exact Test I'm using the following calling sequence:

 dge$samples$group<-factor(
  paste(
    dge $samples$genotype, 
    dge$samples$interval, sep="_" 
  ), levels=c("WT_0H","WT_24H","B8_0H", "B8_24H")
)
et<-exactTest(dge, pair=c('WT_0H', 'WT_24H'))
deg<-as.data.frame(topTags(et, n=Inf))
degE<-data.table(deg)

For the contrast between WT 0 Hours and WT 24 Hours the exact test detects 4174 significant genes and the QLTest detects 3924 genes (FDR < 0.05, | log2 fold change| > 1).

> et_genes<-degE[FDR < 0.05 & abs(logFC) > 1, ]$gene_id
> length(et_genes)
[1] 4174

> qt_genes<-degQ[FDR < 0.05 & abs(logFC.WT) > 1, ]$gene_id
> length(qt_genes)
[1] 3924

This in and of itself is not surprising, however there are 571 genes that have an FDR < 0.05 by the exact test only, and 321 genes that have and FDR < 0.05 by the QLFTest only.

> length(setdiff(et_genes, qt_genes))
[1] 571
> length(setdiff(qt_genes, et_genes))
[1] 321

My understanding was that the QLFTest was less prone to type 1 error, and I would have expected all (or nearly all) of the genes detected using the QLFTest to be a subset of the genes detected by the exact test. I was hoping someone could identify some resources that might help me decide which testing approach is more appropriate for my data, and my questions.

edgeR RNA-Seq • 2.7k views

ADD COMMENT • link 4.5 years ago by adam.faranda ▴ 110

score 0 · Answer 1 · 2020-03-28

0

Entering edit mode

4.5 years ago

ATpoint 84k

Please use google first. This has been discussed in the past multiple times. The developers recommend QLF or exact: https://support.bioconductor.org/p/84291/. You might also want to use FilterByExpr as recommended by the authors.

ADD COMMENT • link 4.5 years ago by ATpoint 84k