Question

Can a statistically associated mutation by Fisher's Exact Test, really not be associated?

1

Entering edit mode

4.2 years ago

david_s.l ▴ 10

It happens that I have a SNP A that in theory should cause resistance to a drug. I have carried out an inspection of bacterial strains analyzing whether or not the mutation was present in the strains with phenotypic resistance. I found that the mutation was only present in 21 resistant strains and 59 sensitive strains. On the other hand, among the strains that did not present this mutation 184 were resistant and 4513 were sensitive. The rest of resistant strains that did not present SNP A, presented other SNPs; however, I dichotomized the data only to the absence or presence of SNP A.

To get an initial idea if this mutation was associated, I did a Fisher Exact Test and it came out that they were associated (p <0.05), in addition to presenting an Odds Ratio of 7. But the fact that this SNP was present in 59 strains Sensitive gives me a lot of distrust and as it does not give me a confidence that it is really associated with the resistant phenotype. I do not like the idea of concluding by saying: "yes it is associated with resistance (21) but it can be present in a greater number of sensitive strains (59)".

Do you think the exact test is not a good statistic for this case? What would be the correct way to analyze this data? Or how could I interpret the results?

Thank you in advance for your comments and valuable help.

SNP Fisher exact Test Resistance Mutation • 1.1k views

ADD COMMENT • link updated 4.1 years ago by Biostar 20 • written 4.2 years ago by david_s.l ▴ 10

1

Entering edit mode

Imagine: 100 subjects were smoking, 100.000 not. 99 smoking persons got lung cancer, 200 of 100.000 non smokers also got this disease. Does the fact that 200 > 99 bothers you when you say that smoking is associated with lung cancer in such situation? You just had much more non smokers for testing.

So, you just had more sensitive strains. This is the test of proportions, you should not use it to make conclusions about absolute numbers.

ADD REPLY • link 4.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

German, thanks for your response.

So how should I direct my analysis on this particular SNP? Perhaps I should only compare the proportion of SNP occurrence between resistant strains and sensitive strains? Would this be statistically sufficient to conclude that there is no association?

ADD REPLY • link 4.2 years ago by david_s.l ▴ 10

0

Entering edit mode

Fisher test is for independence between rows and columns, you can also use proportion test (prop.test in R), but it will give you same results most probably. I would say "the hypothesis of presence of mutation and phenotypic sensitivity independence is rejected at level of significance 0.05" or something around. You don't need to mention " but it can be present in a greater number of sensitive strains (59)".

ADD REPLY • link 4.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Also think about a correction such as Bonferroni or FDR for multiple testing to get a more "honest" p-value. Remember, if you have 20 strains with no resistance and test them all just by chance one will fall under the p<0.05 threshold.

Also consider haplotypes and not just individual SNVs.

The biochemistry should likely also be considered here ...

ADD REPLY • link 4.2 years ago by colindaven 6.4k

0

Entering edit mode

I think in your data, the mutation "really is associated" with resistance. But since it's neither necessary nor sufficient to grant resistance, mere association might not be all that meaningful.

ADD REPLY • link 4.2 years ago by swbarnes2 14k

0

Entering edit mode

Thank you for your response. In fact, i detect three another SNPs that also confer resistance, but I wanted to measure the association of each SNP separately.

ADD REPLY • link 4.2 years ago by david_s.l ▴ 10