Question

Counting Rna Secondary Structures

1

Entering edit mode

11.4 years ago

Daniel J. Barnes ▴ 10

I have been trying to solve the Rosalind.info RNAS problem (http://rosalind.info/problems/rnas/) for a couple of weeks now. The problem asks you to return the total number of valid folds (bonding graphs) given an RNA seq, using standard Watson-Crick pairing (A-U, C-G) + G-U, no loops smaller than 4bp, and the stipulation that there cannot be two dashed edges {sj,sk} and {sj′,sk′} where j<j′<k<k′ to prevent pseudoknots.

At first I thought I had been accurately counting all structures by brute force, which takes quite some time. So, since this is an exponentially increasing problem, I decided to memoize.

I'm very new to dynamic programming so, I can't be certain if I am messing up the memoization or the counting itself.

Given the input sequence AUGCUAGUACGGAGCGAGUCUAGCGAGCGAUGUCGUGAGUACUAUAUAUGCGCAUAAGCCACGU The returned count should be 284850219977421 according to the sample data provided. However the following code returns 1240732147303.

def memoize(count_struc):
   cache = {}

   def memoizedFunction(*args):
      if args not in cache:
         cache[args] = count_struc(*args)
      return cache[args]

   memoizedFunction.cache = cache
   return memoizedFunction

@memoize
def count_struc(seq):
    bonds = {'A':['U'],'C':['G'],'G':['C','U'],'U':['A','G']}
    i=0
    total = 0
    for x in seq:
        j = 0
        while j < len(seq):
            if j > i + 4:
                if seq[j] in bonds[seq[i]]:
                    total += count_struc(seq[i+1:j])
            j += 1
        i += 1
    return total + 1

I have a suspicion that I am counting incorrectly, but I've been working this over and I can't reason out the issue. Can anyone offer a suggestion?

python rna algorithm • 2.3k views

ADD COMMENT • link updated 11.4 years ago by Asaf 10k • written 11.4 years ago by Daniel J. Barnes ▴ 10

score 1 · Answer 1 · 2012-11-21

1

Entering edit mode

11.4 years ago

Asaf 10k

I think that you don't count the situations in which you have two stem-loop structures. For instance, given a fold ((((....))))...((((....)))) you will count the first stem but you'll ignore the different structures of the rest of the sequence, when i and j capture the second stem you ignore the structures of the first stem. You should (I think) add a split option in which you call count_struct with each part of the sequence (split at each nucleotide) and multiply the returned values.

ADD COMMENT • link 11.4 years ago by Asaf 10k

0

Entering edit mode

I see what you mean. Right now it only appears to be counting individual structures and not entire folds. I tried a splitting type technique previously, but it was a horrible mess. I'll spend some more time on it and see if I can get it to work for me.

ADD REPLY • link 11.4 years ago by Daniel J. Barnes ▴ 10