Question: For multiple testing correction in GO analysis, is it ok to remove GO terms with only 1 gene hit?
0
gravatar for chnyale
2.4 years ago by
chnyale0
chnyale0 wrote:

I am doing a GO analysis for my gene sets and plan to implement the Benjamini-Hochberg method to adjust the resulted pValues for multiple testing correction. Since the BH method depends on the total number of testing or pValues calculated, I wonder if it is ok or not to remove all GO terms with only 1 gene hits (or those with 1 or 2 gene hits) before calculating the pValues? In that way, the total number of pValues will be reduces, which may produce more significant adjust pValues. The logic is that the GO terms with just 1 or 2 genes hits are more likely not to be significant.

So my plan is like this:

  1. Find out how many GO terms are included in my gene sets
  2. Remove those GO terms with just 1 or 2 gene hits
  3. Calculate enrichment pValues for the rest GO terms, and the total number of testing will be equal to the GO terms with >2 gene hits
  4. Use BH method to adjust pValues

Is this procedure ok or not? Are there any published papers with similar procedures? Any comments or references will be appreciated. Thank you!

ADD COMMENTlink modified 2.4 years ago by EagleEye6.8k • written 2.4 years ago by chnyale0
0
gravatar for EagleEye
2.4 years ago by
EagleEye6.8k
Sweden
EagleEye6.8k wrote:

Hi,

It is not ideal to remove those while performing enrichment analysis. But later when you are filtering GO terms, you may consider FDR parameter along with number of genes/hits to be in the term as filter criteria. But when you are calculating FDR, it must contain all the hits and their pvalues, otherwise you create a bias in your analysis. You can have a look at these articles where I considered P-value cutoff along with minimum number of genes in each term as cutoff to filter the terms.

Articles:

https://www.nature.com/articles/s41467-018-03265-1#Sec15

https://academic.oup.com/nar/article/46/18/9384/5053167#122402618

https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-016-0274-6#Sec2

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by EagleEye6.8k

It looks like you used a threshold of at least 5 genes of 5% of a pathway. How did you decide on those thresholds? Do you have a reference for that minimum size?

Thanks!

ADD REPLYlink written 10 months ago by amandastahlke0

Hi amandastahlke,

I have used the p-value cutoff. Just to make it more stringent, I have added one more layer of the cutoff. No rule was applied.

ADD REPLYlink written 10 months ago by EagleEye6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1998 users visited in the last hour
_