A friend asked me what is the probability of observing a 7-mer in a 10kb genomic stretch. She assumes the 7-mer is biologically functional, however she wanted to know the probability that the 7-mer happend by chance.

I have two answers, but I don't know which is right. They might be the same just stated differently. I was hoping that someone with a strong probability background could help.

- 4 bases {A,T,G,C} - assuming equal probability for now.
- 4^7 combinations of the 4 bases.
- 1 / 4^7 of randomly observing any specific 7mer = 6.103516e-05
- The sequence would be seen ~ every 4^7 bases on average? = 16,384

alternatively I have found:

http://gasstationwithoutpumps.wordpress.com/2012/11/10/a-probability-question/

so the expected number of occurrences of that k-mer is N/4^k,

- 10kb / 4 ^ 7 = 0.6103516

These are very different answers. Can anyone provide some clarity?

Thanks

I got the same problem but little bit vary. suppose I have a motif sequence as GATAAAG. sequence length is 1000 bp. this is a 7-mer. so as you descripe i can get how many expected occurrence of such 7-mer within 1000bp. this includes all possible combination of expected counts. but if I need to count expected value of exact same order of sequence as mention above how do we modify this calculation?

