Question: Problem with topGO: GO term enrichment analysis
0
gravatar for cristian
2.6 years ago by
cristian240
cristian240 wrote:

Hi,

I am trying to use topGO to do some GO term enrichment analysis.

I believe that I have everything I need to do the analysis: the full list of genes with p-values from a differential expression test, a sublist of that containing the significant genes, a map between gene IDs and GO terms.

library(GO.db)
library(topGO)

head(geneList2)
WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005 WBGene00000006 
     0.7425803             NA             NA             NA             NA             NA 

head(topDiffGenes2)
WBGene00000608 WBGene00000609 WBGene00000657 WBGene00000668 WBGene00000677 WBGene00000680 
   0.013863506    0.037929727    0.004536303    0.001591211    0.019982714    0.044952727

head(idGoMap)
$WBGene00000001
 [1] "GO:0019901" "GO:0040024" "GO:0008340" "GO:0046854" "GO:0005942" "GO:0016301" "GO:0008286"
 [8] "GO:0043551" "GO:0046935" "GO:0035014"

$WBGene00000002
[1] "GO:0005886" "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:1902475" "GO:0003333"
[8] "GO:1990184"

$WBGene00000003
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

$WBGene00000004
[1] "GO:0005886" "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:1902475" "GO:0003333"
[8] "GO:1990184" "GO:0016021"

$WBGene00000005
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

$WBGene00000006
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes2, nodeSize = 10, annot = annFUN.db, affyLib = idGoMap)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘geneSelectionFun’ in an object of class “topGOdata”; is(value, "function") is not TRUE

However, I am not sure how to use the new function in R to create my topGO object. Can anybody be of any help, please?

Thanks.

Best wishes,

C.

error go topgo bioconductor R • 2.6k views
ADD COMMENTlink modified 2.2 years ago by adnbps10 • written 2.6 years ago by cristian240
2
gravatar for e.rempel
2.6 years ago by
e.rempel800
Germany, Heidelberg, COS
e.rempel800 wrote:

Hi cristian,

I think you are using not the correct specifications for topGO object:

  1. geneSel should be a function, not a numeric vector,
  2. it seems that you using a mapping between your genes and GO terms. In this case you should use

    annot = annFUN.gene2GO

Have a look at topGO manual here and at my answer to another question regarding topGO here

ADD COMMENTlink written 2.6 years ago by e.rempel800

Hi,

based on your post, I did this:

sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.gene2GO, gene2GO = idGoMap)

Building most specific GOs .....
    ( 2898 GO terms found. )

Build GO DAG topology ..........
    ( 5286 GO terms and 11861 relations. )

Annotating nodes ...............
    ( 8666 genes annotated to the GO terms. )

do you think it's correct now?

Thanks. Best, C.

ADD REPLYlink written 2.6 years ago by cristian240
1

Yes, not you seem to have created the object correctly. Now you can use it to test the enrichment with runTest...

ADD REPLYlink written 2.6 years ago by Lluís R.890

Thanks, I am back on track with the tutorial now.

ADD REPLYlink written 2.6 years ago by cristian240

Hi,

I run into another problem now:

resultKS <- runTest(sampleGOdata2, algorithm = "classic", statistic = "ks")

             -- Classic Algorithm -- 

         the algorithm is scoring 5286 nontrivial nodes
         parameters: 
             test statistic: ks
             score order: increasing
Error in seq_len(N)[-x.a] : 
  only 0's may be mixed with negative subscripts
> resultKS.elim <- runTest(sampleGOdata2, algorithm = "elim", statistic = "ks")

             -- Elim Algorithm -- 

         the algorithm is scoring 5286 nontrivial nodes
         parameters: 
             test statistic: ks
             cutOff: 0.01
             score order: increasing

     Level 17:  4 nodes to be scored    (0 eliminated genes)
Error in ks.test(x.a, seq_len(N)[-x.a], alternative = "greater") : 
  not enough 'x' data

Do you know what it could be?

C.

ADD REPLYlink written 2.6 years ago by cristian240

Error messages are sometimes difficult to interpret. I assume something is still wrong with sampleGOdata2 Could you post the output of

sampleGOdata2
ADD REPLYlink written 2.6 years ago by e.rempel800
sampleGOdata2

------------------------- topGOdata object -------------------------

 Description:
   -  Simple session 

 Ontology:
   -  BP 

 46739 available genes (all genes from the array):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 NA NA NA NA  ...
   - NA  significant genes. 

 8666 feasible genes (genes that can be used in the analysis):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 NA NA NA NA  ...
   - NA  significant genes. 

 GO graph (nodes with at least  1  genes):
   - a graph with directed edges
   - number of nodes = 5286 
   - number of edges = 11861 

------------------------- topGOdata object -------------------------
ADD REPLYlink written 2.6 years ago by cristian240
1

There is something wrong: there should be some significant genes instead of NA. Possible explanations:

  1. topDiffGenes is not a function
  2. topDiffGenes/You should take care of NAs in gene list, e.g. by replacing them with 1.
ADD REPLYlink written 2.6 years ago by e.rempel800
topDiffGenes
function (allScore) 
{
    return(allScore < 0.01)
}
<bytecode: 0x1290c3f80>
attr(,"source")
[1] "function(allScore) {"      "  return(allScore < 0.01)" "}"
ADD REPLYlink written 2.6 years ago by cristian240
geneList2[is.na(geneList2)] <- 1
> 
> sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 1, annot = annFUN.gene2GO, gene2GO = idGoMap)

Building most specific GOs .....
    ( 2898 GO terms found. )

Build GO DAG topology ..........
    ( 5286 GO terms and 11861 relations. )

Annotating nodes ...............
    ( 8666 genes annotated to the GO terms. )
> sampleGOdata2

------------------------- topGOdata object -------------------------

 Description:
   -  Simple session 

 Ontology:
   -  BP 

 46739 available genes (all genes from the array):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 1 1 1 1  ...
   - 27  significant genes. 

 8666 feasible genes (genes that can be used in the analysis):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 1 1 1 1  ...
   - 5  significant genes. 

 GO graph (nodes with at least  1  genes):
   - a graph with directed edges
   - number of nodes = 5286 
   - number of edges = 11861 

------------------------- topGOdata object -------------------------
ADD REPLYlink written 2.6 years ago by cristian240

Starting to look better, I imagine the NA's arise from genes with zero expression?? In any case, replacing them by 1 seems like a reasonable idea. Thanks. Best, C.

ADD REPLYlink written 2.6 years ago by cristian240

Yes, it looks ok now. Thank you for accepting my answer.

ADD REPLYlink written 2.6 years ago by e.rempel800
1
gravatar for halo22
2.6 years ago by
halo22140
Indianapolis, IN
halo22140 wrote:

Take a look at this too: https://www.rdocumentation.org/packages/topGO/versions/2.24.0/topics/topGOdata-class

ADD COMMENTlink written 2.6 years ago by halo22140
0
gravatar for halo22
2.6 years ago by
halo22140
Indianapolis, IN
halo22140 wrote:

Why does geneList2 have NA and not actual pvalues? I am assuming you ran a comparison between genes, so you should have p-values.
Also try class(geneList2)

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by halo22140

Hi,

class(geneList2)
[1] "numeric"

geneList2 has p-values along with NA's for genes for which it could not compute a p-value (maybe genes without any expression in the conditions?).

ADD REPLYlink written 2.6 years ago by cristian240
0
gravatar for adnbps
2.2 years ago by
adnbps10
adnbps10 wrote:

The topGO documentation is impossible to follow. You might try using "enrichr" instead: http://amp.pharm.mssm.edu/Enrichr/

There is also an R version available. I got this up and running in a few minutes, versus fumbling around with topGO for hours.

ADD COMMENTlink written 2.2 years ago by adnbps10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour