Statistics Of Enrichment Of Indels
1
0
Entering edit mode
12.0 years ago
PoGibas 5.1k

Usually I use Fisher or Wilcoxon rank sum, but this time data is different and I don't know what should I use.

Data: Four DNA sequences -> different amount of mutations

I want to prove that sequenceA is enriched of mutations compared to sequenceB (their length is equal)

Also I do have sequenceC (prolonged sequenceA) & sequenceD (prolonged sequenceB).

How one should do it?

My data looks kinda like that (positions of indels):

254, 1000, 1036, 5448, 7315 -> sum = 6 mut;

63, 75, 967, 3691 -> sum = 4 mut;

Really looking forward to your answers. Thanks in advance.

It's for quick glance at p value so I don't need anything fancy. Hope someone will help.

statistics • 1.9k views
ADD COMMENT
1
Entering edit mode

Why doesn't Fisher's exact test work for you? I think it is a reasonable approach.

ADD REPLY
3
Entering edit mode
12.0 years ago
Michael 54k

I think if the number of indels is small compared to the gene length, then Fisher's exact test should just be an ok approximation. Count the number of positions where an indel occurred vs. the number of positions without a mutation, yielding a contigency table like this example (given gene length =10000 for both):

cont.table= matrix(c(6,4,10000-6,10000-4), ncol=2, byrow=T)
cont.table
     [,1] [,2]
[1,]    6    4
[2,] 9994 9996

Then apply fisher.test to test for the alternative hypothesis that column 1 is enriched with respect to 2:

fisher.test(cont.table, alternative="greater")

    Fisher's Exact Test for Count Data

data:  cont.table 
p-value = 0.3769
alternative hypothesis: true odds ratio is greater than 1 
95 percent confidence interval:
 0.4357716       Inf 
sample estimates:
odds ratio 
  1.500262
ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6