Question: DNA stadistical studies- Local Base Composition Code
0
gravatar for andreammo97
15 months ago by
andreammo970 wrote:

Hi there! :)

I have a question for you all. I am trying to study the local base composition of a DNA sequence by using python. If you don't know what I am talking about, don't worry here I explain to you:

Imagine that you have this DNA sequence:

g a g t t t t a t c g c g c t t c c a t g

And you want to know how many a, c, g and t are in a part of it (window) and you want to repeat this process in the whole sequence but with a certain offset. So, this is a sum up of what you will have:

So, at the end, you will have the base composition of each subgroup you have made from the beginning sequence.

This is what I am trying to do in python. Here is my code:

def composicionBasesLocal(seq, window_len = 200, offset = 100, circular = False):
    lowest = 0
    highest = window_len
    res = []

    while highest<=len(seq)-1:
        window = seq[lowest:highest+1]

        if lowest<= len(seq):
            mm = ModeloMultinomial(window)
            res.append(mm)

        else:
            break

    lowest = lowest + offset
    highest = highest + offset

    return(res)

ModeloMultinomial(seq) code:

def ModeloMultinomial(seq):
    ModMul = []
    pa = seq.count('A')/len(seq)
    pc = seq.count('C')/len(seq)
    pg = seq.count('G')/len(seq)
    pt = seq.count('T')/len(seq)

    ModMul.append([pa,pc,pg,pt])

    return(['pa','pc','pg', 'pt'], ModMul)

This code (composicionBasesLocal) doesn't give me any message error but when I run it, it loops and I have to stopped it. I did it whit a for loop and it works without any problem.

What I have done wrong? Thank you!! :D

ADD COMMENTlink modified 7 months ago by schlogl70 • written 15 months ago by andreammo970

check for the indentation of the

lowest = lowest + offset highest = highest + offset

because you are in a infinite loop.

ADD REPLYlink written 5 months ago by schlogl70
def composicionBasesLocal(seq, window_len = 200, offset = 100, circular = False):
    lowest = 0
    highest = window_len
    res = []

    while highest <= len(seq)-1:
        window = seq[lowest:highest+1]
        print(window)

        if lowest<= len(seq):
            mm = ModeloMultinomial(window)
            res.append(mm)

        else:
            break

        lowest = lowest + offset
        highest = highest + offset
        print(lowest)
        print(highest)

    return(res)
ADD REPLYlink written 5 months ago by schlogl70
0
gravatar for schlogl
7 months ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:
def get_kmers_counts(sequence, k=1):
    """Returns the count of all the contiguous and overlapping
    substrings of length K from a genome."""
    return Counter(sequence[i:i+k] for i in range(len(sequence) - k + 1))

def get_kmers_frequencies(sequence, k=1):
    """Returns the frequencies of all the contiguous and overlapping
            substrings of length K from a genome."""
    kmers = get_kmers_counts(sequence, k)
    freq = defaultdict(float)
    for mer, count in kmers.items():
        freq[mer] = round(count / sum(kmers.values()), 4)
    return freq

You can use it as two separate functions or use it to make your own function! Paulo

ADD COMMENTlink written 7 months ago by schlogl70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour
_