Question

Problem with topGO: GO term enrichment analysis

0

Entering edit mode

6.7 years ago

biomagician ▴ 410

Hi,

I am trying to use topGO to do some GO term enrichment analysis.

I believe that I have everything I need to do the analysis: the full list of genes with p-values from a differential expression test, a sublist of that containing the significant genes, a map between gene IDs and GO terms.

library(GO.db)
library(topGO)

head(geneList2)
WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005 WBGene00000006 
     0.7425803             NA             NA             NA             NA             NA 

head(topDiffGenes2)
WBGene00000608 WBGene00000609 WBGene00000657 WBGene00000668 WBGene00000677 WBGene00000680 
   0.013863506    0.037929727    0.004536303    0.001591211    0.019982714    0.044952727

head(idGoMap)
$WBGene00000001
 [1] "GO:0019901" "GO:0040024" "GO:0008340" "GO:0046854" "GO:0005942" "GO:0016301" "GO:0008286"
 [8] "GO:0043551" "GO:0046935" "GO:0035014"

$WBGene00000002
[1] "GO:0005886" "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:1902475" "GO:0003333"
[8] "GO:1990184"

$WBGene00000003
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

$WBGene00000004
[1] "GO:0005886" "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:1902475" "GO:0003333"
[8] "GO:1990184" "GO:0016021"

$WBGene00000005
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

$WBGene00000006
[1] "GO:0016020" "GO:0015171" "GO:0005887" "GO:0015297" "GO:0015179" "GO:0003333"

sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes2, nodeSize = 10, annot = annFUN.db, affyLib = idGoMap)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘geneSelectionFun’ in an object of class “topGOdata”; is(value, "function") is not TRUE

However, I am not sure how to use the new function in R to create my topGO object. Can anybody be of any help, please?

Thanks.

Best wishes,

C.

go topgo bioconductor r error • 7.9k views

ADD COMMENT • link updated 6.2 years ago by adnbps ▴ 10 • written 6.7 years ago by biomagician ▴ 410

1

Entering edit mode

6.7 years ago

halo22 ▴ 300

Take a look at this too: https://www.rdocumentation.org/packages/topGO/versions/2.24.0/topics/topGOdata-class

ADD COMMENT • link 6.7 years ago by halo22 ▴ 300

0

Entering edit mode

6.7 years ago

halo22 ▴ 300

Why does geneList2 have NA and not actual pvalues? I am assuming you ran a comparison between genes, so you should have p-values.
Also try class(geneList2)

ADD COMMENT • link 6.7 years ago by halo22 ▴ 300

0

Entering edit mode

Hi,

class(geneList2)
[1] "numeric"

geneList2 has p-values along with NA's for genes for which it could not compute a p-value (maybe genes without any expression in the conditions?).

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

0

Entering edit mode

6.2 years ago

adnbps ▴ 10

The topGO documentation is impossible to follow. You might try using "enrichr" instead: http://amp.pharm.mssm.edu/Enrichr/

There is also an R version available. I got this up and running in a few minutes, versus fumbling around with topGO for hours.

ADD COMMENT • link 6.2 years ago by adnbps ▴ 10

score 2 · Accepted Answer · 2017-08-16

2

Entering edit mode

6.7 years ago

e.rempel ★ 1.1k

Hi cristian,

I think you are using not the correct specifications for topGO object:

geneSel should be a function, not a numeric vector,
it seems that you using a mapping between your genes and GO terms. In this case you should use

annot = annFUN.gene2GO

Have a look at topGO manual here and at my answer to another question regarding topGO here

ADD COMMENT • link 6.7 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Hi,

based on your post, I did this:

sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.gene2GO, gene2GO = idGoMap)

Building most specific GOs .....
    ( 2898 GO terms found. )

Build GO DAG topology ..........
    ( 5286 GO terms and 11861 relations. )

Annotating nodes ...............
    ( 8666 genes annotated to the GO terms. )

do you think it's correct now?

Thanks. Best, C.

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

1

Entering edit mode

Yes, not you seem to have created the object correctly. Now you can use it to test the enrichment with runTest...

ADD REPLY • link 6.7 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

Thanks, I am back on track with the tutorial now.

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

0

Entering edit mode

Hi,

I run into another problem now:

resultKS <- runTest(sampleGOdata2, algorithm = "classic", statistic = "ks")

             -- Classic Algorithm -- 

         the algorithm is scoring 5286 nontrivial nodes
         parameters: 
             test statistic: ks
             score order: increasing
Error in seq_len(N)[-x.a] : 
  only 0's may be mixed with negative subscripts
> resultKS.elim <- runTest(sampleGOdata2, algorithm = "elim", statistic = "ks")

             -- Elim Algorithm -- 

         the algorithm is scoring 5286 nontrivial nodes
         parameters: 
             test statistic: ks
             cutOff: 0.01
             score order: increasing

     Level 17:  4 nodes to be scored    (0 eliminated genes)
Error in ks.test(x.a, seq_len(N)[-x.a], alternative = "greater") : 
  not enough 'x' data

Do you know what it could be?

C.

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

0

Entering edit mode

Error messages are sometimes difficult to interpret. I assume something is still wrong with sampleGOdata2 Could you post the output of

sampleGOdata2

ADD REPLY • link 6.7 years ago by e.rempel ★ 1.1k

0

Entering edit mode

sampleGOdata2

------------------------- topGOdata object -------------------------

 Description:
   -  Simple session 

 Ontology:
   -  BP 

 46739 available genes (all genes from the array):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 NA NA NA NA  ...
   - NA  significant genes. 

 8666 feasible genes (genes that can be used in the analysis):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 NA NA NA NA  ...
   - NA  significant genes. 

 GO graph (nodes with at least  1  genes):
   - a graph with directed edges
   - number of nodes = 5286 
   - number of edges = 11861 

------------------------- topGOdata object -------------------------

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

1

Entering edit mode

There is something wrong: there should be some significant genes instead of NA. Possible explanations:

topDiffGenes is not a function
topDiffGenes/You should take care of NAs in gene list, e.g. by replacing them with 1.

ADD REPLY • link 6.7 years ago by e.rempel ★ 1.1k

0

Entering edit mode

topDiffGenes
function (allScore) 
{
    return(allScore < 0.01)
}
<bytecode: 0x1290c3f80>
attr(,"source")
[1] "function(allScore) {"      "  return(allScore < 0.01)" "}"

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

0

Entering edit mode

geneList2[is.na(geneList2)] <- 1
> 
> sampleGOdata2 <- new('topGOdata', description = 'Simple session', ontology = 'BP', allGenes = geneList2, geneSel = topDiffGenes, nodeSize = 1, annot = annFUN.gene2GO, gene2GO = idGoMap)

Building most specific GOs .....
    ( 2898 GO terms found. )

Build GO DAG topology ..........
    ( 5286 GO terms and 11861 relations. )

Annotating nodes ...............
    ( 8666 genes annotated to the GO terms. )
> sampleGOdata2

------------------------- topGOdata object -------------------------

 Description:
   -  Simple session 

 Ontology:
   -  BP 

 46739 available genes (all genes from the array):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 1 1 1 1  ...
   - 27  significant genes. 

 8666 feasible genes (genes that can be used in the analysis):
   - symbol:  WBGene00000001 WBGene00000002 WBGene00000003 WBGene00000004 WBGene00000005  ...
   - score :  0.742580347299 1 1 1 1  ...
   - 5  significant genes. 

 GO graph (nodes with at least  1  genes):
   - a graph with directed edges
   - number of nodes = 5286 
   - number of edges = 11861 

------------------------- topGOdata object -------------------------

ADD REPLY • link 6.7 years ago by biomagician ▴ 410

0

Entering edit mode

Starting to look better, I imagine the NA's arise from genes with zero expression?? In any case, replacing them by 1 seems like a reasonable idea. Thanks. Best, C.