Closed:Would you guys help to check if my idea if Markov Model is in the right direction
0
0
Entering edit mode
3.6 years ago
schlogl ▴ 160

Hi guys.

I am trying to learn and understand the ideas of markov models. I tried to develop a code to accomplish my understanding of this.

The idea at the moment is to build or create a random sequence with the model using the code i tried to develop with python. The final idea is to use it to compare the k-mer from a genome against the random sequence.

If you have time, patience to check if the code is correct in it ideas I really appreciate or if you have any other site or I could get help I would appreciate.

My code is this:

import random


random.seed(31)


def random_weighted_base(dist_dict):
    '''This function use a specified frequency distribution and choose and returns 
    a random symbol according to the given distribution '''
    # generate a random number in [0,1)
    bases = list(dist_dict.keys())
    freqs = get_count_frequencies(dist_dict)
    p = list(freqs.values())
    return ''.join(random.choices(bases, weights=p))


def transition_matrix(base_freq, mer_freq):
    """Returns a transition matrix from a base_count dictionary and a
    k-mer frequncy dictionary as inputs."""
    tm = defaultdict(dict)
    for char1, cnt1 in mer_freq.items():
        tm[char1] = tm.get(char1, {})
        for char2, cnt2 in base_freq.items():
            tm[char1][char2] = tm[char1].get(char2, cnt1 * cnt2)
    return tm


def kmer_count(sequence, alphabet=set(['A', 'C', 'G', 'T']), k=1):
    """Return a dictionary with the k-mers as keys and it counts
    as values."""                          
    seq = sequence.upper()
    seq_len = len(seq)                                            
    kmers = [seq[i:i + k] for i in range(0, seq_len - k + 1)]     
    filterd_kmers = [kmer for kmer in kmers if                    
                     all(base in set(alphabet) for base in kmer)] 
    return Counter(filterd_kmers)


def get_count_frequencies(counts):
    """Receive a dictionary as key:counts."""
    total = sum(counts.values())
    return {key: (cnt/total) for key, cnt in counts.items()}


def markov_higher_order(symbol_weights, mer_count, transition_matrix, length):
    """Receive a symbol_weights(dictionary wiht the count of the symbols), 
    a function, that converts counts in frequencies, a transition matrix and 
    the length of the random sequence that is the final output.
    """
    seq = []

    mer = get_random_weighted_mer(mer_count)
    for _ in range(length - 1):
        base, _, _ = get_random_weighted_base(symbol_weights)
        nuc = transition_matrix[mer]
        for char, cnt in nuc.items():
            if char == base:
                seq.append(char)
    return ''.join(seq)

I make the modifications you suggest and I make something different making the sequence adding the mer+base, once the markov order is dependent in this case of base[i-2] and base[i-1] to chose the base[i] right?

def markov_second_order_(symbol_weights, mer_count, transition_matrix, length):
    """Receive a symbol_weights, that is a dictionary with symbol counts, a function
    that converts counts in frequencies, a transition matrix and the length of the random
    sequence that is the output final of the markov_first_order.
    """
    seq = []
    for _ in range(length - 1):
        mer = get_random_weighted_mer(mer_count)
        base, _, _ = get_random_weighted_base(symbol_weights)
        nuc = transition_matrix[mer]
        if base in nuc:
            seq.append(mer+base)
    return ''.join(seq)

If you have any nice reference about the subject, I would really appreciate.

Thank you for your time.

Paulo

sequence genome • 124 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6