I'm using edgeR for DE analysis of RNASeq data. I've been using manual filtering suggested in the EdgeR manual till now.
I followed genefilter package and tried to chose a theta that has the optimal number of rejections. Basically this is what I did : run edgeR once for the data set, then use the p-value results for genefilter calculations and plots.
dge <- DGEList(counts=counts[,!is.na(group)], group=group[!is.na(group)]) d <- dge d <- calcNormFactors(d) d <- estimateDisp(d, trend="none", robust=TRUE) et <- exactTest(d, pair = levels(d$samples$group)) dgeTable = topTags(et, n=nrow(d)) res = data.frame( filterstat = rowSums(d$counts[rownames(dgeTable),]), pvalue = as.data.frame(dgeTable)$PValue, row.names = rownames(dgeTable)) rejBH = filtered_R(alpha=alpha, filter=res$filterstat, test=res$pvalue, theta=theta, method="BH") theta_chosen = theta[max(which(rejBH==max(rejBH)))] use = res$filterstat > quantile(res$filterstat, probs=theta_chosen) dgeFilt=dge[use,] d <- dgeFilt
I used rowSums as filtering criteria, and the after I filter out the genes - in this case I got theta = 0.19, I ran the edgeR exact test again:
### rerun the model d <- dgeFilt d <- calcNormFactors(d) d <- estimateDisp(d, trend="none", robust=TRUE) et <- exactTest(d, pair = levels(d$samples$group))
My question is - even though the previous calculated max(rejBH) - the max number of rejections is 102 , using the same alpha ( 0.1) I'm only getting 60 DE genes after applying the filter. Actually, without applying any filter, the number of DE genes is 75 (this matches rejBH at 0%.)
- Why is the new DE genes number not 102?
- Why am I getting less DE genes even though from the rejection plots it looks like I'll have the max number of rejections at theta = 0.19?
Am I doing anything wrong here or did I misunderstand something?
Thanks in advance for any suggestion!