I am using GOseq on "mm9" as the genome and "geneSymbol" as the ID in my DE.genes and All.genes list. The input files are loaded fine, the named vector is fine and so is the calculated pwf and the nullp plot. The wallenius approximation also fetches the over-represented GO categories amongst the DE genes:
head(GO.wall) category over_represented_pvalue under_represented_pvalue 1878 GO:0016491 2.278057e-09 1.0000000 220 GO:0003674 2.524681e-07 1.0000000 264 GO:0003824 3.039433e-07 1.0000000 159 GO:0001671 1.629699e-04 0.9999960 1911 GO:0016616 2.039002e-04 0.9999346 1909 GO:0016614 2.229153e-04 0.9999252
However, when I find the report GO categoris over enriched using the .05 FDR cutoff using the enriched.GO command, only the top over-represented GO ID (GO: 0016491) is reported repeatedly in the output:
head(enriched.GO)  "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491" "GO:0016491"
I cant seem to debug what I am doing here here as I have followed the instructions from the vignette.
Here is the complete code for reference:
library(goseq) assayed.genes<-scan("Allgenes.txt", what=character()) de.genes<-scan("DE.txt", what=character()) gene.vector=as.integer(assayed.genes %in% de.genes) names(gene.vector)=assayed.genes print(table(gene.vector)) pwf=nullp(gene.vector,"mm9", "geneSymbol") print(head(pwf)) GO.MF=goseq(pwf, "mm9","geneSymbol", test.cats=c("GO:MF")) print(head(GO.MF)) enriched.GO=GO.MF$category[p.adjust(GO.MF$over_represented_pvalue,method="BH")] print(head(enriched.GO))
I will really appreciate any tips for those who have had more experience with GOseq.