I'm doing GO enrichment with topGO, and I get a set of significant GO categories (with some number of genes associated with those categories). When I then try to find the genes corresponding to those categories, using annFUN.org, many of the categories do not exist. Maybe this is something related to pruning the ontology? Any thoughts?
I've included simple code that reproduces this problem:
library(data.table) library(org.Hs.eg.db) library(topGO) ## create a fake set of 10 genes, of which one is significant tmp.sig.genes = data.table(ens_id = c('ENSG00000198752', 'ENSG00000145242', 'ENSG00000127526', 'ENSG00000111110', 'ENSG00000197043', 'ENSG00000186642', 'ENSG00000151952', 'ENSG00000055163', 'ENSG00000154917', 'ENSG00000251664'), sig = c(T, rep(F, 9))) allGenesCat <- factor(as.integer(tmp.sig.genes$sig)) names(allGenesCat) <- tmp.sig.genes$ens_id # run topGO, get significant GO categories suppressMessages(tgd <- new( "topGOdata", ontology='BP', allGenes = allGenesCat, nodeSize=5, annot=annFUN.org, mapping="org.Hs.eg.db", ID = "ensembl" )) resultTopGO.elim <- runTest(tgd, algorithm = "elim", statistic = "Fisher" ) tgd.table = data.table(GenTable( tgd, Fisher.elim = resultTopGO.elim)) ## look at our "significant" results head(tgd.table,1) # GO.ID Term Annotated Significant Expected Fisher.elim # 1: GO:0019538 protein metabolic process 5 1 0.5 0.5 # but the first go term doesn't come up when I query with annFUN.org annFUN.org('BP', mapping="org.Hs.eg.db", ID = "ensembl", feasibleGenes = tmp.sig.genes$ens_id)[['GO:0019538']] # NULL # similarly, that term isn't present for the only significant gene inverseListannFUN.org('BP', mapping="org.Hs.eg.db", ID = "ensembl", feasibleGenes = tmp.sig.genes$ens_id))[['ENSG00000198752']] #  "GO:0006468" "GO:0007010" "GO:0007163" "GO:0007165" "GO:0016477" "GO:0031032" "GO:0031532" "GO:0035556"