Hi guys.
I am trying to learn and understand the ideas of markov models. I tried to develop a code to accomplish my understanding of this.
The idea at the moment is to build or create a random sequence with the model using the code i tried to develop with python. The final idea is to use it to compare the k-mer from a genome against the random sequence.
If you have time, patience to check if the code is correct in it ideas I really appreciate or if you have any other site or I could get help I would appreciate.
My code is this:
import random
random.seed(31)
def random_weighted_base(dist_dict):
'''This function use a specified frequency distribution and choose and returns
a random symbol according to the given distribution '''
# generate a random number in [0,1)
bases = list(dist_dict.keys())
freqs = get_count_frequencies(dist_dict)
p = list(freqs.values())
return ''.join(random.choices(bases, weights=p))
def transition_matrix(base_freq, mer_freq):
"""Returns a transition matrix from a base_count dictionary and a
k-mer frequncy dictionary as inputs."""
tm = defaultdict(dict)
for char1, cnt1 in mer_freq.items():
tm[char1] = tm.get(char1, {})
for char2, cnt2 in base_freq.items():
tm[char1][char2] = tm[char1].get(char2, cnt1 * cnt2)
return tm
def kmer_count(sequence, alphabet=set(['A', 'C', 'G', 'T']), k=1):
"""Return a dictionary with the k-mers as keys and it counts
as values."""
seq = sequence.upper()
seq_len = len(seq)
kmers = [seq[i:i + k] for i in range(0, seq_len - k + 1)]
filterd_kmers = [kmer for kmer in kmers if
all(base in set(alphabet) for base in kmer)]
return Counter(filterd_kmers)
def get_count_frequencies(counts):
"""Receive a dictionary as key:counts."""
total = sum(counts.values())
return {key: (cnt/total) for key, cnt in counts.items()}
def markov_higher_order(symbol_weights, mer_count, transition_matrix, length):
"""Receive a symbol_weights(dictionary wiht the count of the symbols),
a function, that converts counts in frequencies, a transition matrix and
the length of the random sequence that is the final output.
"""
seq = []
mer = get_random_weighted_mer(mer_count)
for _ in range(length - 1):
base, _, _ = get_random_weighted_base(symbol_weights)
nuc = transition_matrix[mer]
for char, cnt in nuc.items():
if char == base:
seq.append(char)
return ''.join(seq)
I make the modifications you suggest and I make something different making the sequence adding the mer+base, once the markov order is dependent in this case of base[i-2] and base[i-1] to chose the base[i] right?
def markov_second_order_(symbol_weights, mer_count, transition_matrix, length):
"""Receive a symbol_weights, that is a dictionary with symbol counts, a function
that converts counts in frequencies, a transition matrix and the length of the random
sequence that is the output final of the markov_first_order.
"""
seq = []
for _ in range(length - 1):
mer = get_random_weighted_mer(mer_count)
base, _, _ = get_random_weighted_base(symbol_weights)
nuc = transition_matrix[mer]
if base in nuc:
seq.append(mer+base)
return ''.join(seq)
If you have any nice reference about the subject, I would really appreciate.
Thank you for your time.
Paulo