Question

Multiple test correction questions

0

Entering edit mode

5.9 years ago

oibioinf • 0

Hi,

I am not new to the field of bioinformatics but always find it hard to understand statistics. Can you please help me out to clarify several things about multiple test correction?

Of course, the plan for the analysis should be created in advance, but it works differently in real life. People create a plan and something happens. Assumptions are not correct, test seems to return strange results. So, in real life I see people are using several statistical tests on the same data set. Or they run the same test, but with different data preprocessing procedures. So, the questions:

Let's say I am running many statistical tests (e.g. Fisher’s exact test and Wald test) on the same database (assume that it's mRNA data, 2000 values). Of course, we need to run multiple test correction for each of the tests and let's say I decide to use Bonferroni correction. I am always confused how I need to correct in this case. My logic is if I applied several statistical tests I need now to correct for 2000*N of tests, but I do not see people do that. If the logic is not correct, can you please point out what exactly is wrong? (please, don't send me back to books about statistics)
Imagine some researcher did an observational study, mRNA again, and used multiple test correction taking N of samples (again, for simplicity Bonferroni). They have not found what they expected, so they decided to choose a specific pathway, selected values in the same data for this pathway and repeated the statistical test, using now only the number of proteins in this pathways for Bonferroni correction. They then found that many genes are now significantly up/down regulated. For me, this sounds like massaging the data and this shouldn't be published. What would you say? Would the correct way be to select this pathway from the beginning and do the test only on this from the beginning?
Take the previous case and imagine many separate groups are working on the same dataset. They don't communicate well enough and they do many tests. One group made a test on the full dataset and published that they haven't found a significant upregulation in this data. Another group made a test on pre-selected pathways and published they have found a significant up-regulation in these. In terms of statistics, everything looks fine, doesn't it? If it looks fine, then why 2nd case is wrong and this one is right? What would be a correct approach in this case in terms of multiple test correction?

Thanks for your reply in advance.

RNA-Seq statistics • 1.4k views

ADD COMMENT • link updated 5.9 years ago by Devon Ryan 104k • written 5.9 years ago by oibioinf • 0

1

Entering edit mode

You may have a look at the StatQuest series on Youtube.

ADD REPLY • link 5.9 years ago by ATpoint 81k

0

Entering edit mode

thank you, that is a good video! But it doesn't answer my questions

ADD REPLY • link 5.9 years ago by oibioinf • 0

score 2 · Answer 1 · 2018-06-04

One would typically argue that the progression of the N tests is due to going from incorrect to more correct statistical models for the data. As such you wouldn't bother correcting for the N-1 tests. Having said that, the "try tools until you get one that produces what you want to see" is basically fishing for p-values and inappropriate for that reason. I have never tried more than 2 tools for a given analysis and even then I had a reasonable reason for going from one tool to another. If you really are going through large numbers of tools until you find something that gives you the results you expect then you need to re-evaluate everything you're doing.
There are a number of metanalyses done by quacks that use this method to argue for nonsense (e.g., homeopathy). The term for this is p-hacking and is grounds for rejecting papers. If you can't select the pathways beforehand, then find interesting pathways in a given dataset and use that as a pilot for another study focusing on those pathways (using independent samples). Otherwise you're just fooling yourself regarding the p-values.
If you torture the data enough it'll confess to almost anything. If the groups were working independently then there's no p-hacking, so no one is committing an ethics violation. As with the first case, one typically uses datasets like this as pilots for other experiments or facets of larger projects where there will then be other confirmational experiments.