Fisher Exact Test For Testing Ratios Of Two Columns
1
1
Entering edit mode
8.2 years ago

I am trying to check to see if the ratios of Non_syn_snps to Total_SNps is significant or not for all 10 chromosomes in my SNP data set. Here is the dataframe for the same.

   Ch Non_Syn Total_Snps    ratios
1 A01    4658      23657 0.1968973
2 A02    3347      16685 0.2005993
3 A03    7292      36963 0.1972784
4 A04    2608      13161 0.1981612
5 A05    1883      10665 0.1765588
6 A06    4141      22033 0.1879454


When i tried to do fisher test on the fourth column

fisher.test(data_total[,4])


I am getting this error

"Error in fisher.test(test_fish[, 4]) : if 'x' is not a matrix, 'y' must be given"

I even tried to convert this to matrix and do the same thing but still i am getting the same error.

Can somebody tell me what i am doing wrong here? Also is it ok to just use the fourth column for fisher test for my purpose?

Thanks Upendra

test • 4.7k views
2
Entering edit mode

You're not going to use the ratios for a Fisher's test, but rather the number of synonymous and non-synonymous SNPs. Also, you need something to compare the numbers to (i.e., you want to test whether your observations are significantly different from something else), which seems to be missing here.

1
Entering edit mode
8.2 years ago
KCC ★ 4.0k

You need to feed in values in the second and third column and pick at least two rows.

The Fisher exact Test works on a matrix with at least two columns and at least two rows. The idea of the test is to check if the distribution of at least two things changes when you go from at least one condition to another condition.

An example is counting the number of girls and boys in French class and comparing the number of boys and girls in English class. You can test whether change in the distribution of boys to girls is significantly different. If so, you might conclude that the type of class has a significant effect on the genders of the children taking the class.

In your example, you can test if the distribution of number of synonymous and non-synonymous SNPs is different between A01 and A02 for instance. This would be the matrix you need:

4658      23657
3347      16685


Fisher is easily extended to more rows and columns (at least conceptually). You can test of the distribution of synonymous and non-synonymous SNPs is significantly affected by the Ch column. In this case the matrix you need is this:

4658      23657
3347      16685
7292      36963
2608      13161
1883      10665
4141      22033


So, I think the code you want is:

fisher.test(data_total[,c(2,3)])