Question: How to extract genes based on a list of GO terms or their children terms?
gravatar for Macspider
3 months ago by
Vienna - BOKU
Macspider3.3k wrote:

In an R session, I have a data frame with all the Escherichia coli genes and their associated GO terms. Each gene is annotated with one GO term only, representing the deepest annotation level. I then have a character vector of specific GO terms that our collaborators are interested in for their work.

I would like to extract all the genes from the first data frame that are associated with the GO terms in the character vector.

When I say "associated" I mean either carrying a GO term that is found in the vector, or a children of that term. An example: one of the GO terms in the vector is "cell death", but a gene is likely to be annotated with something much more specific, that is a child term of "cell death".

I have GO.db installed but I'm not at all proof with it, since it's the first time I do this. How do I properly carry on this task?

Currently, my strategy would be the following:

  1. With each GO term in the character vector, extract all its children terms using the GO.db package.
  2. unlist() the results into a single character vector containing all initial GO terms and their children.
  3. Extract all genes from the data frame whose associated GO term matches any of the found GO terms / children GO terms.

Would this be the most strategic approach? They are ~ 30 GO terms, and for each I have to extract all its children terms. Sounds like it's gonna be a huge output list.

ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 3 months ago by Macspider3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2323 users visited in the last hour