GO Analysis Clarification using GOstats using hyperGTest()
2.8 years ago
brismiller

Hey everybody,

I have a question about how the hyperGTest() works with regard to the GO terms tested. From one of my results tables, some of the GO terms returned are not in my GO universe (the GO term is not in my organism's obo file).

For example, this GO term was shown to be significantly enriched but if I look in the universe it is not there.

"GO:0008483" %in% GO_Tet_universe\$frame.go_id

[1] FALSE

Finally you should know that I am using the GO annotation file for Tetrahymena thermophila SB210, which presumable does not have every GO term annotated.

My question is, where are these GO terms coming from, and how are they being called enriched when there are no genes known for that term? From my understanding all parent GO terms have all the genes of their children, so is this why the terms are enriched, as their child GO terms are enriched?

First time doing a GO analysis, any type of help would be great

2.8 years ago
EagleEye

It looks like geneOntology.org does not suppot “Tetrahymena thermophila” specifically anymore. But it uses “jcvi” (multispecies microbial annotation). Try using GeneSCF that can use current annotation from geneOntology.org.

Step1: Preparing database for your organism

./prepare_database -org=jcvi -db=GO


Step2: Performing enrichment analysis for your list of genes

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=jcvi --plot=yes --background=15000

Yes, there is no Tetrahymena specific file, but the file used to generate the universe for my analysis was downloaded from the geneontology.org's annotation download page with the filter "+ taxon_subset_closure_label: Tetrahymena thermophila SB210" to download all 34679 Tetrahymena annotations. Would this be any different than using what you proposed above?

I suggested to check with different tool because it is better to verify that you get similar issues with other tools too. So that you will know whether GOstats has any issue processing this annotation (because GO term with no genes must not appear in the enriched list). Personally I always like to verify my results with more than one tool.

2.8 years ago
EagleEye

Hi, I have quickly checked with GeneSCF and term you specified is present in 'jcvi' (below is the result for the term). I guess (not completely sure) it is the problem with processing the annotation by GOstats (it looks like GO stats is trying to convert your gene names to entrez ids (or just number representation) and mapping to the annotation which doesn't have support for entrezid). That is the reason there are no genes in annotation.

GO:0008483 result from GO molecular function using GeneSCF prepare_database:

GO:0008483~transaminase activity    BA_1341,BA_2294,BA_2737,BA_2899,BA_3062,BA_3312,BA_3886,BA_4225,BA_4254,BA_4626,BA_4663,BA_4900,BA_5133,BA_5138,CHY_0011,CHY_1173,CHY_1436,CJE_0146,CJE_0882,CJE_1486,CJE_1514,CPF_0060,CPF_0325,CPF_0356,CPF_0707,CPF_0845,CPF_0911,CPF_1258,CPF_1623,CPF_1667,CPF_1720,CPF_2163,CPF_2212,CPS_0838,CPS_2054,CPS_2190,CPS_3232,CPS_3390,CPS_4612,CPS_4663,CPS_4878,DET_0576,DET_0739,GSU_0018,GSU_0084,GSU_0117,GSU_0162,GSU_1868,HNE_0095,HNE_0652,HNE_0889,HNE_1171,HNE_2243,HNE_2311,HNE_2357,HNE_2367,HNE_2507,HNE_2588,HNE_2594,LMOf2365_0306,LMOf2365_1615,LMOf2365_2132,LMOf2365_2341,MCA_0399,MCA_0598,MCA_1021,MCA_1491,MCA_2053,MCA_2125,MCA_2288,MCA_2997,PFL_0306,PFL_0754,PFL_1309,PFL_1609,PFL_1655,PFL_1824,PFL_1867,PFL_2045,PFL_2138,PFL_2406,PFL_2461,PFL_2868,PFL_3043,PFL_3219,PFL_3222,PFL_3354,PFL_3470,PFL_3521,PFL_4112,PFL_4152,PFL_4247,PFL_4362,PFL_4578,PFL_4657,PFL_4884,PFL_4949,PFL_5269,PFL_5681,PFL_5927,PFL_5960,PFL_6043,PSPTO_0096,PSPTO_1072,PSPTO_1440,PSPTO_1531,PSPTO_1779,PSPTO_1920,PSPTO_2136,PSPTO_5395,PSPPH_0218,PSPPH_0459,PSPPH_0862,PSPPH_1325,PSPPH_1931,PSPPH_2983,PSPPH_3631,PSPPH_4896,PSPPH_5053,SO_2483,SO_3497,SO_3789,SO_4343,SPO_0388,SPO_1136,SPO_1166,SPO_1295,SPO_1370,SPO_1401,SPO_1468,SPO_1567,SPO_1697,SPO_1916,SPO_2005,SPO_2024,SPO_2144,SPO_2589,SPO_2795,SPO_3027,SPO_3220,SPO_3230,SPO_3417,SPO_3471,SPO_A0113,SPO_A0352,SPO_A0354,VC_0392,VC_0748,VC_1184,VC_1625,VC_2309,VC_A0513,VC_A0523,VC_A0605,VC_A0824