Question: Statistics Of Enrichment Of Indels
0
PoGibas4.8k wrote:

Usually I use Fisher or Wilcoxon rank sum, but this time data is different and I don't know what should I use.

Data: Four DNA sequences -> different amount of mutations

I want to prove that sequenceA is enriched of mutations compared to sequenceB (their length is equal)

Also I do have sequenceC (prolonged sequenceA) & sequenceD (prolonged sequenceB).

How one should do it?

My data looks kinda like that (positions of indels):

254, 1000, 1036, 5448, 7315 -> sum = 6 mut;

63, 75, 967, 3691 -> sum = 4 mut;

It's for quick glance at p value so I don't need anything fancy. Hope someone will help.

statistics • 1.4k views
written 7.6 years ago by PoGibas4.8k
1

Why doesn't Fisher's exact test work for you? I think it is a reasonable approach.

3
Michael Dondrup46k wrote:

I think if the number of indels is small compared to the gene length, then Fisher's exact test should just be an ok approximation. Count the number of positions where an indel occurred vs. the number of positions without a mutation, yielding a contigency table like this example (given gene length =10000 for both):

``````cont.table= matrix(c(6,4,10000-6,10000-4), ncol=2, byrow=T)
cont.table
[,1] [,2]
[1,]    6    4
[2,] 9994 9996
``````

Then apply fisher.test to test for the alternative hypothesis that column 1 is enriched with respect to 2:

``````fisher.test(cont.table, alternative="greater")

Fisher's Exact Test for Count Data

data:  cont.table
p-value = 0.3769
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.4357716       Inf
sample estimates:
odds ratio
1.500262
``````