How to extract genes based on a list of GO terms or their children terms?
0
0
Entering edit mode
18 months ago
Macspider ★ 3.5k

In an R session, I have a data frame with all the Escherichia coli genes and their associated GO terms. Each gene is annotated with one GO term only, representing the deepest annotation level. I then have a character vector of specific GO terms that our collaborators are interested in for their work.

I would like to extract all the genes from the first data frame that are associated with the GO terms in the character vector.

When I say "associated" I mean either carrying a GO term that is found in the vector, or a children of that term. An example: one of the GO terms in the vector is "cell death", but a gene is likely to be annotated with something much more specific, that is a child term of "cell death".

I have GO.db installed but I'm not at all proof with it, since it's the first time I do this. How do I properly carry on this task?

Currently, my strategy would be the following:

1. With each GO term in the character vector, extract all its children terms using the GO.db package.
2. unlist() the results into a single character vector containing all initial GO terms and their children.
3. Extract all genes from the data frame whose associated GO term matches any of the found GO terms / children GO terms.

Would this be the most strategic approach? They are ~ 30 GO terms, and for each I have to extract all its children terms. Sounds like it's gonna be a huge output list.

GOterms Gene Ontology Children Match • 577 views