Are P-Values Obtained From Two Separate Analyses On The Same Population Comparable?
4
1
Entering edit mode
10.6 years ago

Are p-values obtained from two separate enrichment analysis on the same population of genes comparable?

For example, let's say I have two differentially expressed gene lists from the same population of genes. ListA is enriched for cell cycle with a p-value of 0.01, listB is enriched for cell cycle with a p-value of 0.001.

Would it be correct to say cell cycle is more significantly enriched in listB than listA? Are the p-values comparable?

enrichment gene-ontology • 3.1k views
ADD COMMENT
0
Entering edit mode

I would say It is correct if you generate listA and listB following indipendent hypotheses and use the same statistics to evaluarte enrichment

ADD REPLY
2
Entering edit mode
10.6 years ago
tiagoantao ▴ 690

I would apply some sort of multi-test correction before comparing multiple lists (especially if they are more than a handfull). The usual caveats for gene enrichment apply: bonferroni too conservative, most FDRs probably also: check David EASE score for an alternative (not really multi-test correction).

Check maybe Huang, DW; Sherman, BT; Lempicki, RA (2009). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1):1-13

ADD COMMENT
0
Entering edit mode

Thanks for the useful reference. I agree with you about too conservative multi-test corrections, and even if It does not seem to be the case with Dk lists, I suggest data splitting techniques and a critical approach (e.g. 'Improving Validation Practices in “Omics” Research' http://www.sciencemag.org/content/334/6060/1230.full)

ADD REPLY
2
Entering edit mode
10.6 years ago

One important thing to keep in mind is that the p-value is not a quality measure. It is simply a measure of likelihood of observing the measure by accident considering a certain data. Therefore the underlying data' properties (in your case the number of GO terms that could be used factor in here as well) are the ones that determine the p-value and it is not a characteristic of the final observation.

IMO the purpose of the p-value is to accept the selection or reject it. In general I don't think it should be used to rank anything (though in reality just about everyone does it all the time). We (me included) tend to rank by p-value when we run out of options.

I would try to find a different measure/attribute to rank my genes and avoid comparing the p-values.

ADD COMMENT
0
Entering edit mode

Thanks. This is exactly my problem. I was trying to see whether I can say one is "more significant" than another. I think I'll take a different approach now and try to add another dimension to my data by looking at fold change.

ADD REPLY
1
Entering edit mode
10.6 years ago
seidel 9.8k

I think the answer is yes. That is, as long as these two lists represent an analysis of the same experiment. When you say "the same population of genes" this seems to imply a single data set (from a single experiment) representing some "universe" of genes - e.g. all the mouse genes represented on an array, some of which can be classified as cell cycle genes. Given that a p-value represents a fractional area under a curve, since listB takes up a tenth of the area of listA, I would call this more significant - even though the curves (or the analysis process that generated them, which you haven't explicitly stated) may be different shapes.

ADD COMMENT
1
Entering edit mode
10.6 years ago
Bill Pearson ★ 1.0k

I agree with Istvan. You can say that one p-value is more significant than the other, but you CANNOT say that they are significantly different. That requires a different test on the hypothesis that the fold-change for the two genes is different.

ADD COMMENT

Login before adding your answer.

Traffic: 1813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6