DNA stadistical studies- Local Base Composition Code
1
0
Entering edit mode
2.5 years ago

Hi there! :)

I have a question for you all. I am trying to study the local base composition of a DNA sequence by using python. If you don't know what I am talking about, don't worry here I explain to you:

Imagine that you have this DNA sequence:

g a g t t t t a t c g c g c t t c c a t g

And you want to know how many a, c, g and t are in a part of it (window) and you want to repeat this process in the whole sequence but with a certain offset. So, this is a sum up of what you will have:

So, at the end, you will have the base composition of each subgroup you have made from the beginning sequence.

This is what I am trying to do in python. Here is my code:

def composicionBasesLocal(seq, window_len = 200, offset = 100, circular = False):
    lowest = 0
    highest = window_len
    res = []

    while highest<=len(seq)-1:
        window = seq[lowest:highest+1]

        if lowest<= len(seq):
            mm = ModeloMultinomial(window)
            res.append(mm)

        else:
            break

    lowest = lowest + offset
    highest = highest + offset

    return(res)

ModeloMultinomial(seq) code:

def ModeloMultinomial(seq):
    ModMul = []
    pa = seq.count('A')/len(seq)
    pc = seq.count('C')/len(seq)
    pg = seq.count('G')/len(seq)
    pt = seq.count('T')/len(seq)

    ModMul.append([pa,pc,pg,pt])

    return(['pa','pc','pg', 'pt'], ModMul)

This code (composicionBasesLocal) doesn't give me any message error but when I run it, it loops and I have to stopped it. I did it whit a for loop and it works without any problem.

What I have done wrong? Thank you!! :D

dna local bases composition stadistics python • 612 views
ADD COMMENT
0
Entering edit mode

check for the indentation of the

lowest = lowest + offset highest = highest + offset

because you are in a infinite loop.

ADD REPLY
0
Entering edit mode
def composicionBasesLocal(seq, window_len = 200, offset = 100, circular = False):
    lowest = 0
    highest = window_len
    res = []

    while highest <= len(seq)-1:
        window = seq[lowest:highest+1]
        print(window)

        if lowest<= len(seq):
            mm = ModeloMultinomial(window)
            res.append(mm)

        else:
            break

        lowest = lowest + offset
        highest = highest + offset
        print(lowest)
        print(highest)

    return(res)
ADD REPLY
0
Entering edit mode
22 months ago
schlogl ▴ 110
def get_kmers_counts(sequence, k=1):
    """Returns the count of all the contiguous and overlapping
    substrings of length K from a genome."""
    return Counter(sequence[i:i+k] for i in range(len(sequence) - k + 1))

def get_kmers_frequencies(sequence, k=1):
    """Returns the frequencies of all the contiguous and overlapping
            substrings of length K from a genome."""
    kmers = get_kmers_counts(sequence, k)
    freq = defaultdict(float)
    for mer, count in kmers.items():
        freq[mer] = round(count / sum(kmers.values()), 4)
    return freq

You can use it as two separate functions or use it to make your own function! Paulo

ADD COMMENT

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6