I am running an enrichment analysis on 3000 differentially expressed genes (mouse). I have successfully taken a DE geneset from DESeq 2. I get two opposing pwf graphs if I plots it for up and downregulated genes. It does not matter which background I use but for arguments sake I used the following code to get the geneset for upregulated genes.
genes <- rownames(subset(deseq2_result, padj <0.05 & log2FoldChange>0))
background <- rownames(subset(deseq2_result, padj >0.05))
I then proceeded to generate the data frame expected by GOSeq, i.e. DE genes being 1 and background genes 0. Interestingly, I get a very unusual pwf plot. The pwf plot for upregulated genes (log2FoldChange > 0) is similar to the one in the vignette, with long genes being more differentially expressed.
However, the plot for significantly downregulated genes is inverted. High proportion of short genes that are DE and low proportion of long DE genes.
If I plot all DE expressed genes no sensible line can be dawn as the bins cancel each other out (high scatter).
Any ideas why this might be?
PS: apologies if I forgot some important background data in my first post