Question: Correlation test for multiple variables and adjusted p values
0
16 months ago by
ASid10
USA
ASid10 wrote:

So I am performing correlation test (pearson) on 500 genes data. I want to check the associations between every pair of gene and in turn getting the r value and p value for each of them of course. In total we get 500 * 499/2 = 124750 tests/gene pairs to compare.

I know next step is to perform multiple comparison check using FDR or Bonferroni procedures. Let's say we have chosen FDR for getting adjusted p value.

My question is regarding the filtering and calculating adjusted p values. if we first filter the comparisons based on a specific r value like 0.4 (lets assume for a moment say we have now filtered 1000 comparisons out because the absolute r value was greater than 0.4) and now we need to run fdr for multiple comparisons, then it is going to use 1000 p values only of course. Are we being biased here? can we do this actually? Because actually we performed 124750 tests and I am not sure if I am going the right way.

modified 16 months ago by Nicolas Rosewick9.2k • written 16 months ago by ASid10
1
16 months ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

You should use all pvalues for multi testing correction

One good idea is to plot nominal pvalues and check the shape of the distribution . Check here for explanation: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

ok thank you.but i am telling this 500 genes number juat as an example. my actual data is much larger. its for 20k genes. in that case what do you suggest!

1

The suggestion is make a histogram of all your p-values and if the shape of that histogram doesn't indicate any issue then apply a correction using all p-values.

0
16 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche24k wrote:

Filtering data before statistical testing as a means to increase sensitivity is often done but is tricky if one wants to still adequately control the false positive rate. See for example this paper. I would hesitate to do it and would only consider it based on independent information, not on any variable that is not clearly independent of the test statistics. So in your case, definitely don't filter on r.