What is considered the best way to handle genes that are not detected at all in a two group comparison, when doing an over-representation analysis ?
For example, define all genes with F.P.K.M. < 1 as not detected. I have three different two-group comparisons to make. If all undetected genes are excluded from the analysis, then different ontology categories will be excluded for each comparison, because different categories will have at least the minimum number of genes in a category. The other option is to keep all genes in the analysis. This means that the ontology categories with sufficient genes in the experiment will be the same for all three comparisons, but it has the undesired effect of more multiple testing adjustment for all genes and also the genes with small counts will inevitably be found to not be differentially expressed. This seems to artificially inflate the count of genes that are not differentially expressed, because the genes might truly be differentially expressed, if more sequencing depth covered those genes, for example by targeted RNA-seq. There must be some abundance threshold below which the answer to differential expression should be "don't know" rather than "no".