Question: Bioconductor -- Quickly Look Up Aspect Of Go Term
1
gravatar for cclark
5.7 years ago by
cclark10
cclark10 wrote:

I am working on a project using Bioconductor that requires that I lookup which GO ontology a given GO term belongs to (i.e. either Molecular Function, Biological Process, or Cellular Component). I need to do this tens of thousands of times, in the inner loop of a larger program. My current solution is to use the GO.db Bioconductor package to create three predicates like this one:

library(GO.db)
isMF <- function(term){
!is.null(GOMFPARENTS[[term]])
}

Unsurprisingly, however, this is prohibitively slow when invoked tens of thousands of times. Is there a Bioconductor package out there somewhere that would give me a faster way to look up this data, or will I need to implement a faster data structure for this purpose myself? I'm just learning R, so I'd like to just use an existing function, if possible.

R bioconductor go • 2.6k views
ADD COMMENTlink modified 5.7 years ago by Martin Morgan1.6k • written 5.7 years ago by cclark10

If you're just learning, you might want to explore a bit more. This is a perfect place to use hash tables.

ADD REPLYlink modified 3 months ago by RamRS21k • written 5.7 years ago by pld4.8k
1
gravatar for Martin Morgan
5.7 years ago by
Martin Morgan1.6k
United States
Martin Morgan1.6k wrote:

Please ask questions about Bioconductor packages on the Bioconductor mailing list (no subscription required). As with most things in R, it's better to use vectorized operations rather than iterating. Also, the interface to GO and other databases has been simplified. You could instead

> vals = select(GO.db, keys(GO.db, "GOID"), c("TERM", "ONTOLOGY"))
> dim(vals)
[1] 37391     3
> head(vals)
        GOID                                                         TERM ONTOLOGY
1 GO:0000001                                    mitochondrion inheritance       BP
2 GO:0000002                             mitochondrial genome maintenance       BP
3 GO:0000003                                                 reproduction       BP
4 GO:0000006 high affinity zinc uptake transmembrane transporter activity       MF
5 GO:0000007     low-affinity zinc ion transmembrane transporter activity       MF
6 GO:0000009                       alpha-1,6-mannosyltransferase activity       MF

and then do standard R operations, e.g., vals[vals$ONTOLOGY == "MF",]. The Annotation work flow provides some additional material.

ADD COMMENTlink modified 3 months ago by RamRS21k • written 5.7 years ago by Martin Morgan1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour