Hi all,
I have a question about probability.
Let us say we have a DNA fragment of length 20 bp and I have 2 genomes of approximately same size (150mb). What is the probability of finding the same fragment in 2 genomes? If I decide to include mis-matches, how do I account for probability ( will it be as simple as 20 bp - number of mis-matches or do I have to account for all the '20 choose 5' combinations?)?
My calculations so far: since there can be a total of 4^20
randomly occurring 20bp fragments, the probability of finding a particular 20bp fragment is 1/(4^20)
, but I don't know how the probability of finding a fragment in a genome of 15E7 bp is (1/(4^20))^(15E7)
(from this Probability Of Finding A Dna Sequence In A Window)?
Any help will be greatly appreciated!
Thanks,
Thanks for the explanation. Do you have any insights on finding odds with mis-matches?
Look up the binomial theory. The 20th level of Pascal's Triangle would be helpful too.