Question

Correlation test for multiple variables and adjusted p values

0

Entering edit mode

4.8 years ago

ASid ▴ 40

So I am performing correlation test (pearson) on 500 genes data. I want to check the associations between every pair of gene and in turn getting the r value and p value for each of them of course. In total we get 500 * 499/2 = 124750 tests/gene pairs to compare.

I know next step is to perform multiple comparison check using FDR or Bonferroni procedures. Let's say we have chosen FDR for getting adjusted p value.

My question is regarding the filtering and calculating adjusted p values. if we first filter the comparisons based on a specific r value like 0.4 (lets assume for a moment say we have now filtered 1000 comparisons out because the absolute r value was greater than 0.4) and now we need to run fdr for multiple comparisons, then it is going to use 1000 p values only of course. Are we being biased here? can we do this actually? Because actually we performed 124750 tests and I am not sure if I am going the right way.

correlation gene to gene associations • 2.7k views

ADD COMMENT • link updated 4.8 years ago by Nicolas Rosewick 11k • written 4.8 years ago by ASid ▴ 40

score 1 · Answer 1 · 2019-07-22

1

Entering edit mode

4.8 years ago

Nicolas Rosewick 11k

You should use all pvalues for multi testing correction

One good idea is to plot nominal pvalues and check the shape of the distribution . Check here for explanation: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

ADD COMMENT • link 4.8 years ago by Nicolas Rosewick 11k

0

Entering edit mode

ok thank you.but i am telling this 500 genes number juat as an example. my actual data is much larger. its for 20k genes. in that case what do you suggest!

ADD REPLY • link 4.8 years ago by ASid ▴ 40

1

Entering edit mode

The suggestion is make a histogram of all your p-values and if the shape of that histogram doesn't indicate any issue then apply a correction using all p-values.

ADD REPLY • link 4.8 years ago by Jean-Karim Heriche 27k

score 0 · Answer 2 · 2019-07-22

Filtering data before statistical testing as a means to increase sensitivity is often done but is tricky if one wants to still adequately control the false positive rate. See for example this paper. I would hesitate to do it and would only consider it based on independent information, not on any variable that is not clearly independent of the test statistics. So in your case, definitely don't filter on r.