Question

For multiple testing correction in GO analysis, is it ok to remove GO terms with only 1 gene hit?

0

Entering edit mode

5.5 years ago

chnyale • 0

I am doing a GO analysis for my gene sets and plan to implement the Benjamini-Hochberg method to adjust the resulted pValues for multiple testing correction. Since the BH method depends on the total number of testing or pValues calculated, I wonder if it is ok or not to remove all GO terms with only 1 gene hits (or those with 1 or 2 gene hits) before calculating the pValues? In that way, the total number of pValues will be reduces, which may produce more significant adjust pValues. The logic is that the GO terms with just 1 or 2 genes hits are more likely not to be significant.

So my plan is like this:

Find out how many GO terms are included in my gene sets
Remove those GO terms with just 1 or 2 gene hits
Calculate enrichment pValues for the rest GO terms, and the total number of testing will be equal to the GO terms with >2 gene hits
Use BH method to adjust pValues

Is this procedure ok or not? Are there any published papers with similar procedures? Any comments or references will be appreciated. Thank you!

gene multiple tesing GO analysis • 3.3k views

ADD COMMENT • link updated 5.5 years ago by EagleEye 7.5k • written 5.5 years ago by chnyale • 0

score 0 · Answer 1 · 2018-10-20

0

Entering edit mode

5.5 years ago

EagleEye 7.5k

Hi,

It is not ideal to remove those while performing enrichment analysis. But later when you are filtering GO terms, you may consider FDR parameter along with number of genes/hits to be in the term as filter criteria. But when you are calculating FDR, it must contain all the hits and their pvalues, otherwise you create a bias in your analysis. You can have a look at these articles where I considered P-value cutoff along with minimum number of genes in each term as cutoff to filter the terms.

Articles:

https://www.nature.com/articles/s41467-018-03265-1#Sec15

https://academic.oup.com/nar/article/46/18/9384/5053167#122402618

https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-016-0274-6#Sec2

ADD COMMENT • link 5.5 years ago by EagleEye 7.5k

0

Entering edit mode

It looks like you used a threshold of at least 5 genes of 5% of a pathway. How did you decide on those thresholds? Do you have a reference for that minimum size?

Thanks!

ADD REPLY • link 4.0 years ago by amandastahlke • 0

0

Entering edit mode

Hi amandastahlke,

I have used the p-value cutoff. Just to make it more stringent, I have added one more layer of the cutoff. No rule was applied.

ADD REPLY • link 4.0 years ago by EagleEye 7.5k