How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS ?
3
0
Entering edit mode
7.8 years ago
jack ▴ 950

I have done gene ontology enrichment analysis using Ontologizer. the output is like this :

ID            Pop.total    Pop.term    Study.total    Study.term    Pop.family    Study.family    nparents    is.trivial    p                     p.adjusted            p.min
GO:0000000    15117        15075       3743           3733          0             0               0           true          1.0                   1.0                   1.0
GO:0008800    15117        3           3743           1             4             1               1           false         0.7500000000000001    0.7500000000000001    0.25000000000000006
GO:0052547    15117        23          3743           6             670           208             3           false         0.7699531541574813    0.7699531541574813    3.789971590544467E-43
GO:0000003    15117        1           3743           1             11028         2874            1           false         0.26060935799429585   0.26060935799429585   9.067827348505459E-5
GO:0052548    15117        22          3743           5             280           100             2           false         0.9446003360272346    0.9446003360272346    3.811732507786439E-33

How can I have translation of GO terms? what does this table means?

gene-ontology R RNA-Seq • 4.3k views
2
Entering edit mode
7.8 years ago
EagleEye 7.4k

You can use simple bash script: I hope this should work

go_convert.sh

#!/bin/bash

GOlist=(cat $1 | awk '!x[$0]++' | cut -f $3) for i in "${GOlist[@]}"
do
cat $2 | grep "$GOlist" >> GO_mapped.txt
done

Run:

./go_convert.sh <YOUR_INPUT_FILE> <GO_DB_FILE_FROM_github> <YOUR_column_number having_GO_IDs>
1
Entering edit mode

Sorry use this:

#!/bin/bash

GOlist=(cat $1 | cut -f$3 | awk '!x[$0]++') for i in "${GOlist[@]}"
do
cat $2 | grep "$i" >> GO_mapped.txt
done

I have ran it on sample files and got the results, check out:

./go_convert.sh input_file.txt sample_go_db.txt 1

http://bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

0
Entering edit mode

Thanks. which GO_DB_FILE_FROM_github should I use? there are few files there. and can I ask how you have generated this files which are in GitHub?

1
Entering edit mode

This file will have all biological_process, molecular_function and cellular_components.

1
Entering edit mode

Those files are generated from geneontology.org which are being used by the tool GeneSCF.

1
Entering edit mode
7.8 years ago
EagleEye 7.4k

You can try this tool which gives the results in more detailed manner. If you are working on Human and Linux system, this tool will be useful for you: Gene Set Clustering based on Functional annotation (GeneSCF)

Or still if you want to translate the IDs which you got, use http://geneontology.org/ and search your GO ID there.

Update: GeneSCF now supports all organisms/species from KEGG and Gene Ontology repository.

0
Entering edit mode

I want to translate them, but the question is that, how can I do that in automated manner ? because there are lot's of GO ids for my gene cases (2000) and it's not feasible to copy and paste them in the gene ontology website to search them individually.

0
Entering edit mode

You can use this annotation files from GeneSCF to map it, if you are familiar with playing with files: https://github.com/santhilalsubhash/geneSCF/tree/master/annotation

0
Entering edit mode

My organism is not model organism and I had to prase everything by myself, now I have the enrichment of GO ids and I need to translate them, but I don't know exactly how to parse it and which files I should use. can you help on it bit more?

1
Entering edit mode
7.8 years ago
SES 8.5k

This information is all in the documentation. Click "Help" and then "Help Contents..." Honestly, I'm confused how you got this far without knowing what these fields are, such as the population and study IDs. These would have to be created before the analysis, so you might want to think about whether these results are exactly what you want to test. From the docs:

GO id: The accession number of the GO term
Name: The name of the GO term
NSP: The namespace, or subontology: biological process (B), cellular component (C) or molecular function (F)
P-value: The nominal (uncorrected) P-value resulting from the observed overrepresentation of the GO term
Pop. Count: The number of genes in the population set that are annotated to the GO term in question
Study Count: The number of genes in the study set that are annotated to the GO term in question

If you want to know the definition of your GO term, search it on QuickGO. For example, https://www.ebi.ac.uk/QuickGO/GSearch?q=GO:0008800

0
Entering edit mode

I Know what the population set, study set,... what I need is an automated way to translate the GO ID to their concepts like Glycolysis.... and because my study case is around thousounds , it doesn't make sense to search them individually

1
Entering edit mode

Did you try my script and file?? Please let me know if you needmore help in that.

0
Entering edit mode

it works, but it's create messy file with unnecessary information. what I need is that, the script just add the one line(the line which begin with GO ID) of the GO_mapped.txt file to the last column of my YOUR_INPUT_FILE. Basically first column of my input file is GO ID and I want to add just translation of the GO ID to the last column of my input file. for example for GO:0016021 the last column would be integral component of membrane cellular_component. Can you help me with this?

1
Entering edit mode

You can try this new script which merges the output with your input file in the last column (Keep in mind all files should be TAB-separated):

Note: whenever you run this script, please delete the output created from last run... otherwise it will keep on appending into previously created file.

0
Entering edit mode

Thanks, but this does not add it to the last column of my input file, for example, one line of my input file is like this:

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0

and what I expect as output is

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0  transcription, DNA-templates
1
Entering edit mode

Yes when I use the sample files used along with the script. It gives the output exactly like you wanted. You can check my sample inputs and output file generated in the same compressed folder.

1
Entering edit mode

Sample Input file:

GO:0002040    dsrg    dg
GO:0006351    drfh    gjfj
GO:0008283    ksjhgk    skjrhgfl
GO:0032466    kjf    ksjgf
GO:0032877    öl    g
GO:0033301    fnbl    ksjg
GO:0045944    hfo    jgp
GO:0060707    jpgs    jge

Merged annotation to input:

GO:0002040    dsrg    dg    sprouting angiogenesis
GO:0006351    drfh    gjfj    transcription, DNA-templated
GO:0008283    ksjhgk    skjrhgfl    cell proliferation
GO:0032466    kjf    ksjgf    negative regulation of cytokinesis
GO:0032877    öl    g    positive regulation of DNA endoreduplication
GO:0033301    fnbl    ksjg    cell cycle comprising mitosis without cytokinesis
GO:0045944    hfo    jgp    positive regulation of transcription from RNA polymerase II promoter
GO:0060707    jpgs    jge    trophoblast giant cell differentiation
0
Entering edit mode

What do you mean exactly with input file? What I mean with input file is the one I have in the original post and in your command correspond to <YOUR_INPUT_FILE>. Am I right ? :)

1
Entering edit mode

Your input file is the file you want to add annotation or the file you mentioned in your first post.

0
Entering edit mode

1
Entering edit mode

You don't have to search one by one, there is a link on the QuickGo page showing the very simple ways of getting descriptions for terms with different programming languages. In Bash, it can be done with one line.

0
Entering edit mode

@ SES How you got this information. I'm using it in Linux and the header of my files after running is this:

ID    Pop.total    Pop.term    Study.total    Study.term    Pop.family    Study.family    nparents    is.trivial    p    p.adjusted    p.min
1
Entering edit mode

In your original post you asked what that table means and I explained it, and also showed how you could get this information from the documentation. Then, you answered and said you know what that information means but your main interest is in the GO definitions. Now, you are asking what the table means again? This is obviously confusing. Please refer to the documentation or my post for a description of the results.

For getting the GO definitions, see the QuickGO WebServices page. There are examples for numerous languages on that page and if you read the documentation you'll see that you can come up with a Bash or Perl script for your task in no time.

1
Entering edit mode

Hi Jack, please let me know whether you managed to add terms to your file. I want to know that it worked or not, so that I will decide to keep the script or remove it. Therefore other people in future will know whether to use it or not.

And as SES says please change the post topic from

How to interpret the result of GO analysis using Ontologizer? To How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS.

Because you are asking two different questions in same post.

0
Entering edit mode

It worked, thanks

0
Entering edit mode