That always makes me wonder... I find it a very unsafe approach to eliminate genes as non-expressed in particular tissue/cell type. What if we actually introduce further bias because of detection issues? I.e. can we really say that certain set of genes are not expressed at all under no conditions in particular tissue/cell type. NB: tissues and cells are dynamic and responsive, there is no static state and static signature that would be true under all conditions. That's why we do the experiments after all. Therefore the argument that because some genes might not be detected, we should remove even more genes from the background set, doesn't really convince me.
Now, I can understand the point some people make that if we get a transcriptomic profile of a tissue and compare to "universe" background all we'll learn will be that we study that tissue. Yet, if I design experiments aiming at discovering an enriched/enhanced process, I would normally compare the same tissue/cell type, e.g. treated and untreated. Which means that the tissue- or cell type-specific signature will be "filtered out" at the level of DE, as those genes should be more or less at the same expression level, and the enriched sets will contain genes regulated by the treatment. Unless the treatment also affects e.g. differentiation rate of the tissue or its identity, then I would receive terms relevant to that tissue phenotype, but in that case obviously I would want to know they are regulated.
So, with all the possible biases, I still feel that comparing against all the genes that could be expressed (hence all the genes) is more biologically relevant than comparing against an artificially/arbitrarily selected background.
But I would be very happy if somebody could suggest a thorough reading on the topic, especially related to NGS (RNA-seq and ChIP-seq data). I found the brief article cited above a bit disappointing.