Question: Can't find GO annotation with topGO
4.1 years ago
bvernot wrote:

Hello all,

I'm doing GO enrichment with topGO, and I get a set of significant GO categories (with some number of genes associated with those categories). When I then try to find the genes corresponding to those categories, using, many of the categories do not exist. Maybe this is something related to pruning the ontology? Any thoughts?

I've included simple code that reproduces this problem:


## create a fake set of 10 genes, of which one is significant
tmp.sig.genes = data.table(ens_id = c('ENSG00000198752', 'ENSG00000145242', 'ENSG00000127526', 'ENSG00000111110', 'ENSG00000197043', 'ENSG00000186642', 'ENSG00000151952', 'ENSG00000055163', 'ENSG00000154917', 'ENSG00000251664'),
                           sig = c(T, rep(F, 9)))

allGenesCat <- factor(as.integer(tmp.sig.genes$sig))
names(allGenesCat) <- tmp.sig.genes$ens_id

# run topGO, get significant GO categories
suppressMessages(tgd <- new( "topGOdata", ontology='BP', allGenes = allGenesCat, nodeSize=5,
                   , mapping="", ID = "ensembl" ))
resultTopGO.elim <- runTest(tgd, algorithm = "elim", statistic = "Fisher" )
tgd.table = data.table(GenTable( tgd, Fisher.elim = resultTopGO.elim))

## look at our "significant" results
#         GO.ID                                        Term Annotated Significant Expected Fisher.elim
# 1: GO:0019538                   protein metabolic process         5           1      0.5         0.5

# but the first go term doesn't come up when I query with'BP', mapping="", ID = "ensembl", feasibleGenes = tmp.sig.genes$ens_id)[['GO:0019538']]

# similarly, that term isn't present for the only significant gene'BP', mapping="", ID = "ensembl", feasibleGenes = tmp.sig.genes$ens_id))[['ENSG00000198752']]
# [1] "GO:0006468" "GO:0007010" "GO:0007163" "GO:0007165" "GO:0016477" "GO:0031032" "GO:0031532" "GO:0035556"
