Question

Go - Statistical Test

0

Entering edit mode

10.5 years ago

jprmachado ▴ 80

Hi,

I have done Gene Ontology Enrichment in REVIGO, returned with this values e.g.:

term_ID            description                                            frequency    plot_X    plot_Y    plot_size    log10 p-value
GO:0003674    molecular_function                            100.00%    -1.661    -0.407    7.295    -4.816
GO:0003735    structural constituent of ribosome    2.53%    -5.484    -2.9            5.698    -42.8994
GO:0004459    L-lactate dehydrogenase activity            0.03%    0.82            6.074    3.764    -2.2153

I have different values for more 10 species. I want to know if is possible to do any statistical test to know which GO terms are contributing for differences between the species.

Thank you in advance

test go • 3.2k views

ADD COMMENT • link updated 10.5 years ago by Charles Warden 8.2k • written 10.5 years ago by jprmachado ▴ 80

0

Entering edit mode

Hi,

I tried a solution for my problem.

I sum the pairwise differences between the p-values (=EXP(log10 p-value)) for each GO term. When not present a specific GO term for a given species i attributed a value of 1 for that species.

The higher sum mean higher differences between the p-values, therefore more asymmetrical representation of GO term p-value.

Does anyone agree, or more relevantly, does anyone think is wrong? is statistically wrong?

Thank you.

ADD REPLY • link updated 10.5 years ago by Devon Ryan 104k • written 10.5 years ago by jprmachado ▴ 80

0

Entering edit mode

While writing my original comment (below), I see you spun this part off as its own comment. I guess this clarifies that you want to do some sort of pair-wise comparison (or maybe not, since you then sum everything). However, it's still unclear how the original data was derived and what your actual biological question is.

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

It's a bit difficult to parse your question. Do you want to compare one species against all of the others or do pair-wise comparisons? Also, exactly how were the original values (prior to calculating GO enrichment) derived. Normally, one does GO enrichment on a gene list following performing a differential expression analysis. You could simply do that. BTW, I assume you mean 10^(log10 p-value) rather than what you wrote. Otherwise, you're not actually getting the original p-values (as an aside, it's almost always a bad idea to compare p-values and try to derive a difference metric from that!).

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

Hi dpryan79,

yes i meant 10^(log10 p-value). For all the GO obtained for each species, i want to see which GO's shows higher asymmetrical enrichment between species. I focus on GO not on species.

I have a list of pseudogenes. Submitted to BiNGO in cytoscape. Then i saw molecular function on ReviGO.

Once more thank you.

ADD REPLY • link updated 10.5 years ago by Devon Ryan 104k • written 10.5 years ago by jprmachado ▴ 80

0

Entering edit mode

"asymmetrical enrichment" isn't a coherent concept. There's either an enrichment of something or not.

BiNGO is intended to perform GO analysis on a network in cytoscape, which raises the question of how your networks were created and whether they're even meaningful to begin with (given the little context you've given, I suspect that the networks aren't meaningful). Continuing from that train of thought, I'm not entirely sure what it would even mean for a pseudogene to have a molecular function (since you mentioned species, I assume that you're not working on cancer, where this might make sense). That sort of negates the "pseudo" part of the term. I get the feeling that this question isn't exactly well thought out.

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

I come here with a question. Right or Wrong, is acceptable, is called process of learning. I didnt come here to be mistreat.

i leave you with one reference. thank for all the rest.

http://genomebiology.com/content/11/3/R26

ADD REPLY • link 10.5 years ago by jprmachado ▴ 80

score 1 · Answer 1 · 2013-11-04

Between any pairs of gene lists, you can do a Fisher Exact test (where you compare the proportion of GO category genes genes in one sample/species to another sample/species instead of to a background frequency).

I think it would be hard to interpret a single statistical tests that compared the results for 10 species simultaneously. I would probably recommend either binning the categories into significant or non-significant (so, category A was significantly enriched with FDR < 0.05 in X / 10 species) or prioritize based upon the relative p-values (so, species X showed the lowest p-value for enrichment for category A). Pairwse FE test listed above could also be acceptable.

Understanding the overall goals may also be important. For example, you mention you have a list of pseudogenes, but why are these pseudogenes coming from >10 different species? GO categories will either be defined independently for each species or are defined by homologs to a single, commonly studied species. For example, the human genome has GO categories but I doubt the chimp genome has an independently curated list of GO categories. If that were the case,I don't think it would make sense to compare the relative contributions of pseudogenes that happen to have the closest homology to a known human gene versus a known chimp gene (if the assumption was correct that all chimp GO definitions were really coming from human homologs). In this case, you would call say all genes belong to a single species (whatever primate is being studied), and the GO categories are defined with respect to human homologs (or just the human genes, if you are working with human pseudogenes). Since I don't know the background for this particular comparison, I can't really provide better advice on this particular aspect.