I have a question about the bioconductor goseq package for GO enrichment analysis. Those top-ranked categories are obtained based on the ranking of "overrepresentedpvalues" from the goseq object. The goseq also includes "underrepresentedpvalues" from the same output. Can I know how the over/under-representations are determined?
My question can be probably generalized in this way: can I say if there are more DE genes for a particular category, then this category is "enriched" and the associated p-value is called "over-represented", while if there are fewer DE genes for a particular category, then this category is called "depleted" and "under-represented"? Can this be reflected in the sign (+/-) of certain statistics?
I am new to this area, so thank you very much for your help! The vignette of the goseq package can be found here.