Calculating frequencies for first-order Markov model given G+C content and CpG fraction
0
0
Entering edit mode
2.7 years ago
valba • 0

Hi all,

I’m trying to generate DNA sequences using a first-order Markov model in GenRGenS. For this, I have to input the frequencies of all possible dinucleotides. I have a list of specific values of GC content and CpG fraction (number of CpG dinucleotides divided by the length of the DNA fragment) that I need to use when generating the DNA sequences. I’m having trouble figuring out how to calculate the frequencies of the dinucleotides from the given GC content and CpG fraction, especially for the dinucleotides CC, CA and CT.

This is what I have tried so far:

k = G+C content
l = CpG fraction

P(1st2nd|1st) = ?

P({C, G}) = k
P(CG) = l

P(AA|A) = (1-k)/2
P(AC|A) = k/2
P(AG|A) = k/2
P(AT|A) = (1-k)/2

P(GA|G) = (1-k)/2
P(GC|G) = k/2
P(GG|G) = k/2
P(GT|G) = (1-k)/2

P(TA|T) = (1-k)/2
P(TC|T) = k/2
P(TG|T) = k/2
P(TT|T) = (1-k)/2

consider
P(CG) = P(CG|C)P(C) = l
P({CG, CC}|C) = k

therefore
P(CG|C) = l / P(C) = l / (k/2) = 2l/k
P(CC|C) = k - 2l/k

P(CA|C) = (1-k)/2
P(CT|C) = (1-k)/2

However, this does not work as P(CC|C) = k - 2l/k gives me a negative probability for the values I’m working with.

Thank you,
Veronica

Markov model • 418 views
ADD COMMENT

Login before adding your answer.

Traffic: 1500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6