Question: topGO is performing enrichment test on GO terms that are not present in my dataset
gravatar for Fabio Marroni
3.1 years ago by
Fabio Marroni2.7k
Fabio Marroni2.7k wrote:

I am afraid I am doing something wrong with topGO. If I perform an enrichment analysis specifying the genes of interest (and the corresponding GO slim terms that I got by biomaRt), in the results I also get GO terms that are not associated to the gene of interest. What is going on?

Here an example. I use a very small file including only one gene (you can copy on a text editor, save it as mygenes.txt, maybe convert spaces to tab, and then read it in R using read.table).

         ensembl_gene_id chromosome_name start_position end_position
VIT_200s1458g00010              Un       39225125     39226050

I paste the code I used for analysis

#Read the only gene and select it as the interesting gene    
#Create geneList object to be specified as the allGenes parameter for creating a topGO object

finto <- as.integer(mygenes$ensembl_gene_id %in% myInterestingGenes)
names(finto) <- mygenes$ensembl_gene_id
#Create the gene2GO object

#For selecting genes to be used in analysis I defined a function called topgosel (it is an overkilling for the example, but useful for the real analysis)

#Finally create the topGO object
mytopgo<-new("topGOdata",ontology="BP",allGenes=finto,annot = annFUN.gene2GO, geneSel= topgosel , gene2GO = mygene2GO)
#Run fisher test
resultFisher <- runTest(sampleGOdata, algorithm = "classic", statistic = "fisher")
#Extract Fisher results
allRes <- GenTable(sampleGOdata, classicFisher = resultFisher,topNodes = min(resultFisher@geneData[4],100))

Now, the gene used for testing and its GO is the following:

> mygene2GO
 [1] "GO:0005575" "GO:0016740" "GO:0016301" "GO:0000166" "GO:0005488"
 [6] "GO:0003674" "GO:0005623" "GO:0005886" "GO:0016020" "GO:0003824"
[11] "GO:0006464" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538"

While in the results I get these GO terms

> allRes$GO.ID
 [1] "GO:0006464" "GO:0008150" "GO:0008152" "GO:0009987" "GO:0019538"
 [6] "GO:0036211" "GO:0043170" "GO:0043412" "GO:0044237" "GO:0044238"
[11] "GO:0044260" "GO:0044267" "GO:0071704"

As you can see, there are some GO terms that are present only in the results, and some that are present only in the input file. This sounds strange to me: it's like topGO decides its own ontology for the gene under investigation.

I am sure I am missing (or messing) something, but what?

Can anyone help me? Do you think it's appropriate if I post the same question to the bioconductor support forum?

ADD COMMENTlink modified 3.1 years ago by Pablo Markaide20 • written 3.1 years ago by Fabio Marroni2.7k

I could reproduce your error, but I don't know what causes it. Are these unexpected GO terms connected to the ones actually associated to your gene? Maybe their significance could come from the GO network structure, but that would be strange. It looks like a bug to me, you should definitely ask about it in bioconductor.

ADD REPLYlink written 3.1 years ago by Martombo2.7k
gravatar for Pablo Markaide
3.1 years ago by
Pablo Markaide20 wrote:

Hi Fabio,

I'm not an expert in topGO, but digging in my old notes (when I was preparing the code for topGO batch analysis), I have comments on how topGO ADDs some parent nodes if they are missing on your sets. In my previous test with small datasets, the addition was bigger compared to more comprehensive gene sets.

If you play with NodeSize during new GOdata object generation, you can see that the number of GO actually is bigger than my universe depending on that parameter:

    #Number below represent unique GOs

    length(unique(unlist(geneID2GO))) #my gene universe
    length(usedGO(GOdata)) #with Nodesize=5
    length(usedGO(GOdata)) #with Nodesize=1

Maybe I'm completely wrong but I'm with @Martombo that you should check the network structure to test if this added GOs you mention are in fact part of the GO hierarchy

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Pablo Markaide20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1467 users visited in the last hour