Question: 'How To' On Gene Ontology Analysis
9
gravatar for Sheila
9.0 years ago by
Sheila250
Germany
Sheila250 wrote:

Hi All,

I am trying to make a 'HOW TO' on gene ontology analysis. If you know answer of these questions please help. I hope it will be very useful to all people who are new to gene ontology analysis and bioinformatic (like me :) Please post answers in one specific tool/language (preferable in R,python,perl)

Give a GO ID (e.g. GO:0090342) How to:

  1. find all its children up-to a specific depth. For example find up-to 4th level
  2. find all its parents and grand parents up-to a specific height. For example up-to 4th level
  3. the name (definition) of the GO ID
  4. draw the tree (with name (definition) & GO ID)
  5. find all the directly associated gene/protein with specific GO ID
  6. find all gene/protein associated with specific GO ID and its children (up-to level n)

Given a gene/protein ID/name (e.g. UniProtKB ID) How to:

  1. find all the associated GO IDs with specific type (e.g. all GO IDs associated with UniProtKB ID which are related to 'biological process' )
  2. remove all over-represented GO IDs (from the result of last query). (e.g. GO:0090342 and GO:0050793 both are associated with p53 but as GO:0050793 is child of GO:0090342 so I want to remove GO:0090342 from the data and want to keep only GO:0050793)
  3. find whether a specific GO ID is associated with a given UniProtKB ID or not?

(Some questions may be repeat/extension of previous questions, still its good to have a direct answer)

gene R python ontology • 7.3k views
ADD COMMENTlink modified 3.3 years ago by gil.hornung70 • written 9.0 years ago by Sheila250
2
gravatar for Girlwithglasses
9.0 years ago by
Girlwithglasses30 wrote:

All of your questions can be answered by querying the GO database using GOOSE, the GO Online SQL Environment http://berkeleybop.org/goose, or AmiGO, the search/browse tool provided by the Gene Ontology at http://amigo.geneontology.org. Both tools have help documentation and there is a substantial list of database queries for GOOSE that includes a number of your questions above.

ADD COMMENTlink written 9.0 years ago by Girlwithglasses30
2
gravatar for Guangchuang Yu
9.0 years ago by
Guangchuang Yu2.4k
China/Guangzhou/Southern Medical University
Guangchuang Yu2.4k wrote:

1 . you can define a function like the following:

getGOLevel <- function(Node="GO:0003674", Children=GOMFCHILDREN, level) {

for (i in seq_len(level-1)) {
    Node <- mget(Node, Children, ifnotfound=NA)
    Node <- unique(unlist(Node))
    Node <- as.vector(Node)
    Node <- Node[!is.na(Node)]
}
return(Node)

}

this function was modified from getGOLevel defined in my package clusterProfiler.

require(GO.db)

getGOLevel(Node="GO:0090342", Children=GOBPCHILDREN,level=2)

[1] "GO:0090343" "GO:0090344" "GO:2000772"

2 . The answer to this question should be the same as question 1.

The function was modified to:

getGOLevel <- function(Node="GO:0090342", Parent=GOBPPARENTS, level) {

for (i in seq_len(level-1)) {
    Node <- mget(Node, Parent, ifnotfound=NA)
    Node <- unique(unlist(Node))
    Node <- as.vector(Node)
    Node <- Node[!is.na(Node)]
}
return(Node)

}

we can test it by:

getGOLevel(Node="GO:0090342", Parent=GOBPPARENTS, level=4)

[1] "GO:0032502" "GO:0008150" "GO:0065007"

3 . This question can be directly answered by the function GO2Term defined in my package clusterProfiler.

clusterProfiler:::GO2Term("GO:0090342")

           GO:0090342

"regulation of cell aging"

4 . I am not familiar with drawing GO tree.

5 . For human, can use the following command:

mget(GOID, org.Hs.egGO, ifnotfound=NA)

6 . This can also be directly answered by the function getGO2ExtID defined in my package clusterProfiler as shown below.

clusterProfiler:::getGO2ExtID("GO:0090342", organism="human")

$GO:0090342

[1] "1029" "2305" "3159" "4000" "4282" "5728" "7471" "8091" "9891"

[10] "10783" "51343" "54708" "87178"

ADD COMMENTlink written 9.0 years ago by Guangchuang Yu2.4k
1
gravatar for gil.hornung
3.3 years ago by
gil.hornung70
European Union
gil.hornung70 wrote:

Regarding the first two items in you list,

I just found the R library GO.db

You can use the following functions:

  • GOxxPARENTS: the parents of the term
  • GOxxANCESTOR: the parents, and all their parents and so on.
  • GOxxCHILDREN: the children of the term
  • GOxxOFFSPRING: the children, their children and so on out to the leaves of the GO graph.

The xx should be replaced by BP, MF, or CC, based on the type of ontology (Biological Process, Molecular Function, Cellular Component)

For example, finding the children of the Cellular Component GO:0005886 plasma membrane:

library(GO.db)
GOCCCHILDREN$"GO:0005886"

You can loop over all results and find their children as many times you want.

ADD COMMENTlink written 3.3 years ago by gil.hornung70
0
gravatar for Cshao
9.0 years ago by
Cshao0
Cshao0 wrote:

All of the questions can be answered by any well known programming language (Java, C/C++, Python, Perl ...), they are basically the same (plain text parsing) -- Download GO file (http://www.geneontology.org/GO.downloads.ontology.shtml) with your favorite format and make a parse.

In my opinion, if you plan to do many works on GO, it is better to use programming language rather than using specific tool.

ADD COMMENTlink written 9.0 years ago by Cshao0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1331 users visited in the last hour