Question

automated gene ontology enrichment for simple gene list (not microarray data)

0

Entering edit mode

9.4 years ago

ruth.stoney ▴ 10

Hi,

I need to find an automated way to do GO enrichment for 3000 sets of genes. I've been working in R but problem I'm having is that the majority of the tools (topGO, goseq) accept microarray data and do not work for simple gene lists.

DAVIDWebService seems like a perfect solution, however I can't find a function to do actual enrichment analysis. It just seems to analyse/visualise existing enrichment files.

I am comfortable writing R and python (and could possibly get a Matlab licence) and would be willing to branch out if other tools are simple to use.

Thanks for any advice!

Ruth

gene • 3.5k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.4 years ago by ruth.stoney ▴ 10

0

Entering edit mode

goseq works with lists of genes, not with microarrays!

ADD REPLY • link 9.4 years ago by Benn 8.4k

Ram · Answer 1 · 2016-02-11

Have a look at the clusterProfiler package in Bioconductor. It accepts a list of Entrez gene ids as input, and it allows to calculate both a simple enrichment and a gsea from Geneontology and other databases.

> m = enrichGO(as.character(c(1,2,3,4,5)) )
> summary(m)
                   ID                                  Description GeneRatio   BgRatio
GO:0019966 GO:0019966                        interleukin-1 binding       1/2   6/18679
GO:0019958 GO:0019958                      C-X-C chemokine binding       1/2   7/18679
GO:0019956 GO:0019956                            chemokine binding       1/2  15/18679
GO:0048306 GO:0048306            calcium-dependent protein binding       1/2  60/18679
GO:0019955 GO:0019955                             cytokine binding       1/2  83/18679
GO:0004867 GO:0004867 serine-type endopeptidase inhibitor activity       1/2  94/18679
GO:0002020 GO:0002020                             protease binding       1/2 103/18679
GO:0019838 GO:0019838                        growth factor binding       1/2 116/18679
GO:0004866 GO:0004866             endopeptidase inhibitor activity       1/2 168/18679
GO:0061135 GO:0061135             endopeptidase regulator activity       1/2 173/18679
GO:0030414 GO:0030414                 peptidase inhibitor activity       1/2 177/18679
GO:0061134 GO:0061134                 peptidase regulator activity       1/2 212/18679
                 pvalue    p.adjust      qvalue geneID Count
GO:0019966 0.0006423467 0.007493844 0.002760890      2     1
GO:0019958 0.0007493844 0.007493844 0.002760890      2     1
GO:0019956 0.0016054798 0.010703199 0.003943284      2     1
GO:0048306 0.0064141802 0.030955323 0.011404593      2     1
GO:0019955 0.0088674776 0.030955323 0.011404593      2     1
GO:0004867 0.0100397218 0.030955323 0.011404593      2     1
GO:0002020 0.0109983147 0.030955323 0.011404593      2     1
GO:0019838 0.0123821292 0.030955323 0.011404593      2     1
GO:0004866 0.0179076991 0.034295408 0.012635150      2     1
GO:0061135 0.0184381870 0.034295408 0.012635150      2     1
GO:0030414 0.0188624742 0.034295408 0.012635150      2     1

Ram · Answer 2 · 2016-02-11

0

Entering edit mode

9.4 years ago

Kamil ★ 2.3k

You might start by considering a function in the limma package called goana. See the examples in the documentation. The function can perform an enrichment test even if you only provide a vector or Entrez Gene IDs, without any other inputs.

See all the other packages available for Gene Set Enrichment at Bioconductor.

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.4 years ago by Kamil ★ 2.3k