Question: Criteria for excluding genes from Fisher tests (presence of mutation vs sample group). Minimum number of samples mutated?
1
gravatar for correlationmatrix
2.6 years ago by
correlationmatrix20 wrote:

I would like to test a number of different genes as to whether a mutation in that gene is significantly associated with a particular group of samples. Thus, for each gene I will perform a Fisher test to compare the number of samples in group A with any mutation in gene 1, vs the number of samples in group B that has a mutation in the same gene. Repeat for X number of genes. Each group consists of 11 samples. However, I note that some of the genes are mutated in very few samples in total, say 1 or 2. In those cases, I could never get a significant p-value regardless of how the instances of this mutation were distributed across the different samples. Is it then a good idea to discard these genes from the test in order to reduce the influence of the false discovery rate correction I will need to perform? Or can it be considered "fishing" for significance? What is a sensible cutoff for the number of mutated instances to demand in that case? Using an online Fisher test, I note that one can only get a significant p-value when there are at least 5 mutated instances present (in the most optimistic scenario of all mutations belonging to one group). Would it then be wise to use a minimum of 5 mutated samples as a criterion to consider a gene for testing? (I'm asking because it is very easy to find excuses when something looks borderline significant after FDR correction...)

mutation samples fisher • 847 views
ADD COMMENTlink modified 2.6 years ago by theobroma221.1k • written 2.6 years ago by correlationmatrix20

The statistical principle you are looking for is "independent filtering". If you google for "independent filtering gwas" you will get some ideas.

The key element is that your filtering should be performed on a metric independent on the test statistic.

ADD REPLYlink written 2.6 years ago by WouterDeCoster41k
0
gravatar for theobroma22
2.6 years ago by
theobroma221.1k
theobroma221.1k wrote:

Never discard! You have to first acknowledge if your using methods that control for outliers, and what is the influence of those outliers on the outcome. You hit the nail on the head regarding the statistics, so now you can biologically validate the first few true negatives in the group. I guess you should also validate the one and only true positive as well. This will tell you if your data model presents any 'falseness' using that statistic. The most arbitrary and commonly used cutoff for a p-value is no greater than 5%. This is also based on the data dimensions though...what's the size of your data matrix? FYI: Fishers Test is quite robust!! You could Anova your data, right? Then, you get stars. :)

ADD COMMENTlink written 2.6 years ago by theobroma221.1k

biologically validate the first few true negatives in the group. I guess you should also validate the one and only true positive as well.

Hm? I'm not sure if I follow you here.
OP is doing an association analysis. Direct validation is probably impossible.

ADD REPLYlink written 2.6 years ago by WouterDeCoster41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour