Question: How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS ?
0
gravatar for jack
3.9 years ago by
jack730
Germany
jack730 wrote:
I have done gene ontology enrichment analysis using  Ontologizer. the output is like this :

ID    Pop.total    Pop.term    Study.total    Study.term    Pop.family    Study.family    nparents    is.trivial    p    p.adjusted    p.min
GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0
GO:0008800    15117    3    3743    1    4    1    1    false    0.7500000000000001    0.7500000000000001    0.25000000000000006
GO:0052547    15117    23    3743    6    670    208    3    false    0.7699531541574813    0.7699531541574813    3.789971590544467E-43
GO:0000003    15117    1    3743    1    11028    2874    1    false    0.26060935799429585    0.26060935799429585    9.067827348505459E-5
GO:0052548    15117    22    3743    5    280    100    2    false    0.9446003360272346    0.9446003360272346    3.811732507786439E-33

 

How can I have translation of GO terms? what does this table means ?

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by jack730
2
gravatar for EagleEye
3.9 years ago by
EagleEye6.0k
Sweden
EagleEye6.0k wrote:

You can use simple bash script: I hope this should work

go_convert.sh

 

----------------------------------------

 

#!/bin/bash

GOlist=(`cat $1 | awk '!x[$0]++' | cut -f $3`)

for i in "${GOlist[@]}"
do

cat $2 | grep "$GOlist" >> GO_mapped.txt

done

 

 

----------------------------------------------

 

 

Run: 

./go_convert.sh <YOUR_INPUT_FILE> <GO_DB_FILE_FROM_github> <YOUR_column_number having_GO_IDs>

 

 

ADD COMMENTlink written 3.9 years ago by EagleEye6.0k
1

Sorry use this:

 

#!/bin/bash

GOlist=(`cat $1  | cut -f $3 | awk '!x[$0]++'`)

for i in "${GOlist[@]}"
do

cat $2 | grep "$i" >> GO_mapped.txt

done

 

I have ran it on sample files and got the results, check out:  ./go_convert.sh input_file.txt sample_go_db.txt 1

 

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

 

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k

Thanks. which GO_DB_FILE_FROM_github should I use ? there are few files there. and can I ask how you have generated this files which are in GitHub?

ADD REPLYlink written 3.9 years ago by jack730
1

This file will have all biological_process, molecular_function and cellular_components   gene_association.grouped.annotated140122_new.txt

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k
1

Those files are generated from geneontology.org which are being used by the tool GeneSCF.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k
1
gravatar for EagleEye
3.9 years ago by
EagleEye6.0k
Sweden
EagleEye6.0k wrote:

You can try this tool which gives the results in more detailed manner. If you are working on Human and Linux system, this tool will be useful for you: Gene Set Clustering based on Functional annotation (GeneSCF)

Or still if you want to translate the IDs which you got, use http://geneontology.org/ and search your GO ID there.

Update: GeneSCF now supports all organisms/species from KEGG and Gene Ontology repository.

ADD COMMENTlink modified 5 months ago • written 3.9 years ago by EagleEye6.0k

I want to translate them, but the question is that, how can I do that in automated manner ? because there are lot's of GO ids for my gene cases (2000) and it's not feasible to copy and paste them in the genen ontology website to search them individually.

ADD REPLYlink written 3.9 years ago by jack730

You can use this annotation files from GeneSCF to map it, if you are familiar with playing with files: https://github.com/santhilalsubhash/geneSCF/tree/master/annotation

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k

my organism is not model organism and I had to prase everything by myself, now I have the enrichment of GO ids  and I need to translate them, but I don't know exatly how to parse it and which files I should use. can you help on it bit more ?

ADD REPLYlink written 3.9 years ago by jack730
1
gravatar for SES
3.9 years ago by
SES8.1k
Vancouver, BC
SES8.1k wrote:

This information is all in the documentation. Click "Help" and then "Help Contents..." Honestly, I'm confused how you got this far without knowing what these fields are, such as the population and study IDs. These would have to be created before the analysis, so you might want to think about whether these results are exactly what you want to test. From the docs:

GO id: The accession number of the GO term
Name: The name of the GO term
NSP: The namespace, or subontology: biological process (B), cellular component (C) or molecular function (F)
P-value: The nominal (uncorrected) P-value resulting from the observed overrepresentation of the GO term
Adj. P-Value: The adjusted P-Value (adjusted by the MTC procedure chosen by the user)
Pop. Count: The number of genes in the population set that are annotated to the GO term in question
Study Count: The number of genes in the study set that are annotated to the GO term in question

If you want to know the definition of your GO term, search it on QuickGO. For example, https://www.ebi.ac.uk/QuickGO/GSearch?q=GO:0008800

 

ADD COMMENTlink written 3.9 years ago by SES8.1k

I Know what the population set, study set,... what I need is an automated way to translate the GO ID to their concepts like Glycolysis.... and because my study case is around thousounds , it doesn't make sense to search them individually

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by jack730
1

Did you try my script and file?? Please let me know if you needmore help in that.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k

it works, but it's create messy file with unnecessary information. what I need is that, the script just add the one line(the line which begin with GO ID) of the GO_mapped.txt  file to the last column of my YOUR_INPUT_FILE. Basically first column of my input file is GO ID and I want to add just translation of the GO ID to the last column of my input file. for example for GO:0016021  the last column would be integral component of membrane    cellular_component . Can you help me with this ?

ADD REPLYlink written 3.9 years ago by jack730
1

You can try this new script which merges the output with your input file in the last column (Keep in mind all files should be TAB-separated):

Note: whenever you run this script, please delete the output created from last run... otherwise it will keep on appending into previously created file.

bioinformatics.kandurilab.org/biostars/files/mapping_ids_mergingWithInput.zip

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k

Thanks, but this does not add it to the last column of my input file, for example, one line of my input file is like this :   

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0                                                                                  and what I expect as output is  
GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0  transcription, DNA-templates       
ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by jack730
1

Yes when I use the sample files used along with the script. It gives the output exactly like you wanted. You can check my sample inputs and output file generated in the same compressed folder.

ADD REPLYlink written 3.9 years ago by EagleEye6.0k
1

Sample Input file:


GO:0002040    dsrg    dg
GO:0006351    drfh    gjfj
GO:0008283    ksjhgk    skjrhgfl
GO:0032466    kjf    ksjgf
GO:0032877    öl    g
GO:0033301    fnbl    ksjg
GO:0045944    hfo    jgp
GO:0060707    jpgs    jge


Merged annotation to input:


GO:0002040    dsrg    dg    sprouting angiogenesis
GO:0006351    drfh    gjfj    transcription, DNA-templated
GO:0008283    ksjhgk    skjrhgfl    cell proliferation
GO:0032466    kjf    ksjgf    negative regulation of cytokinesis
GO:0032877    öl    g    positive regulation of DNA endoreduplication
GO:0033301    fnbl    ksjg    cell cycle comprising mitosis without cytokinesis
GO:0045944    hfo    jgp    positive regulation of transcription from RNA polymerase II promoter
GO:0060707    jpgs    jge    trophoblast giant cell differentiation

ADD REPLYlink written 3.9 years ago by EagleEye6.0k

what do you mean exactly with input file? what I mean with input file is the one I have in the original post and in your command correspond to <YOUR_INPUT_FILE> . am I right ? :)

 

ADD REPLYlink written 3.9 years ago by jack730
1

Your input file is the file you want to add annotation or the file you mentioned in your first post.

ADD REPLYlink written 3.9 years ago by EagleEye6.0k

but please make sure that your input file is TAB-separated. 

ADD REPLYlink written 3.9 years ago by EagleEye6.0k
1

You don't have to search one by one, there is a link on the QuickGo page showing the very simple ways of getting descriptions for terms with different programming languages. In Bash, it can be done with one line.

ADD REPLYlink written 3.9 years ago by SES8.1k

@ SES How you got this information. I'm using it in Linux and the header of my files after runnig is this :  

ID
 Pop.total
   Pop.term
  Study.total
   Study.term
   Pop.family
   Study.family
   nparents
 is.trivial
p  
p.adjusted
  p.min
ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by jack730
1

In your original post you asked what that table means and I explained it, and also showed how you could get this information from the documentation. Then, you answered and said you know what that information means but your main interest is in the GO definitions. Now, you are asking what the table means again? This is obviously confusing. Please refer to the documentation or my post for a description of the results.

For getting the GO definitions, see the QuickGO WebServices page. There are examples for numerous languages on that page and if you read the documentation you'll see that you can come up with a Bash or Perl script for your task in no time.

.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by SES8.1k
1

Hi Jack, please let me know whether you managed to add terms to your file. I want to know that it worked or not, so that I will decide to keep the script or remove it. Therefore other people in future will know whether to use it or not.

And as SES says please change the post topic from 

How to interpret the result of GO analysis using Ontologizer ? To How to interpret the result of GO analysis using Ontologizer  / mapping GO IDs to GO TERMS.

Because you are asking two different questions in same post.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.0k

it worked, thanks 

ADD REPLYlink written 3.9 years ago by jack730

Santhilal Subhash , I faced with other problem, can you help me with that ? How to create your own association file for gene ontology enrichment analysis ?

ADD REPLYlink written 3.9 years ago by jack730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1063 users visited in the last hour