So I am performing correlation test (pearson) on 500 genes data. I want to check the associations between every pair of gene and in turn getting the r value and p value for each of them of course. In total we get 500 * 499/2 = 124750 tests/gene pairs to compare.
I know next step is to perform multiple comparison check using FDR or Bonferroni procedures. Let's say we have chosen FDR for getting adjusted p value.
My question is regarding the filtering and calculating adjusted p values. if we first filter the comparisons based on a specific r value like 0.4 (lets assume for a moment say we have now filtered 1000 comparisons out because the absolute r value was greater than 0.4) and now we need to run fdr for multiple comparisons, then it is going to use 1000 p values only of course. Are we being biased here? can we do this actually? Because actually we performed 124750 tests and I am not sure if I am going the right way.
ok thank you.but i am telling this 500 genes number juat as an example. my actual data is much larger. its for 20k genes. in that case what do you suggest!
The suggestion is make a histogram of all your p-values and if the shape of that histogram doesn't indicate any issue then apply a correction using all p-values.