Question: How to extract genes based on a list of GO terms or their children terms?
gravatar for Macspider
21 days ago by
Vienna - BOKU
Macspider3.3k wrote:

In an R session, I have a data frame with all the Escherichia coli genes and their associated GO terms. Each gene is annotated with one GO term only, representing the deepest annotation level. I then have a character vector of specific GO terms that our collaborators are interested in for their work.

I would like to extract all the genes from the first data frame that are associated with the GO terms in the character vector.

When I say "associated" I mean either carrying a GO term that is found in the vector, or a children of that term. An example: one of the GO terms in the vector is "cell death", but a gene is likely to be annotated with something much more specific, that is a child term of "cell death".

I have GO.db installed but I'm not at all proof with it, since it's the first time I do this. How do I properly carry on this task?

Currently, my strategy would be the following:

  1. With each GO term in the character vector, extract all its children terms using the GO.db package.
  2. unlist() the results into a single character vector containing all initial GO terms and their children.
  3. Extract all genes from the data frame whose associated GO term matches any of the found GO terms / children GO terms.

Would this be the most strategic approach? They are ~ 30 GO terms, and for each I have to extract all its children terms. Sounds like it's gonna be a huge output list.

ADD COMMENTlink modified 10 days ago by Biostar ♦♦ 20 • written 21 days ago by Macspider3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2206 users visited in the last hour