My aim is to do paired testing for each condition against control
I have a dataset with the following experimental design
conditions: 0,8,24,72 hours of treatment
VirusUsed: wildtype (wt) and mutant (mt)
donors: donor1, donor2, donor3
NB - only one uninfected control for each donor
So accordingly, I designed a paired comparison
Time <- factor(metadata$Time)
Virus <- factor(metadata$VirusUsed)
Donor <- factor(metadata$SampleCode)
Group = factor(paste0(Virus, ".", Time))
Group = relevel(Group, ref = "uninf.0")
design = model.matrix(~Donor+Group)
Resulting in the following design matrix
(Intercept) Donor2 Donor3 mt.24 mt.72 mt.8 wt.24 wt.72 wt.8
1 1 0 0 0 0 0 0 0 1
2 1 1 0 0 0 0 0 0 1
3 1 0 1 0 0 0 0 0 1
4 1 0 0 0 0 1 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 1 0 0 1 0 0 0
7 1 0 0 0 0 0 1 0 0
8 1 1 0 0 0 0 1 0 0
9 1 0 1 0 0 0 1 0 0
10 1 0 0 1 0 0 0 0 0
11 1 1 0 1 0 0 0 0 0
12 1 0 1 1 0 0 0 0 0
13 1 0 0 0 0 0 0 1 0
14 1 1 0 0 0 0 0 1 0
15 1 0 1 0 0 0 0 1 0
16 1 0 0 0 1 0 0 0 0
17 1 1 0 0 1 0 0 0 0
18 1 0 1 0 1 0 0 0 0
19 1 0 0 0 0 0 0 0 0
20 1 1 0 0 0 0 0 0 0
21 1 0 1 0 0 0 0 0 0
and following this I ran the following code
x <- calcNormFactors(x)
x <- estimateDisp(x, design = design)
fit = glmQLFit(x , design = design)
results = list()
for(i in 4:9){
f = glmQLFTest(fit, coef = i)
results[[colnames(design)[i]]] <- f
} #this loops all conditions to compare against control
However, I have the following genes that keep coming up as significant (FDR 1E-7), and upon inspection of the raw counts (below), there is a donor specific response to the virus; this effect still stands if looking at cpm values.
wt8_1 wt8_2 wt8_3 mt8_1 mt8_2 mt8_3 wt24_1 wt24_2 wt24_3 mt24_1 mt24_2
gene1 2090 2 0 2059 0 2 2422 0 6 3273 6
gene2 193 0 0 156 0 0 121 0 0 153 0
gene3 150 0 0 207 0 0 208 0 0 243 0
gene4 1803 2 0 1862 2 2 2586 0 2 3551 0
gene5 1574 0 0 1669 2 2 1248 2 0 1570 0
gene6 13018 6 4 12188 2 2 8693 2 2 11832 4
gene7 3260 4 8 3665 0 0 4212 4 2 5516 0
mt24_3 wt72_1 wt72_2 wt72_3 mt72_1 mt72_2 mt72_3 uninf0_1 uninf0_2
gene1 4 2218 0 6 2371 2 2 6 2
gene2 0 113 0 0 114 0 2 0 0
gene3 0 179 0 0 156 0 0 0 0
gene4 0 1970 2 4 2118 2 0 0 4
gene5 2 976 2 0 1010 0 0 0 0
gene6 8 7615 8 8 8517 8 6 4 2
gene7 2 4565 2 0 4778 2 0 2 2
uninf0_3
gene1 2
gene2 0
gene3 0
gene4 0
gene5 0
gene6 2
gene7 0
Question: I understand that these genes will be significant for Donor1, but given paired testing design and insignificance for the other two donors;
(a) why does edgeR still compute a singificant FDR; does edgeR average the significance between the three donors after calculating them paired? a paired t-test for these genes do not turn out significant;
(b) can I filter these genes out; or is there a way edgeR to calculate significance only if all three paired donors reach significance?
PS: I lump all samples under one big model matrix because I also compare wt vs mt; but that's not within the scope of this question.