Counting Rna Secondary Structures
1
1
Entering edit mode
11.4 years ago

I have been trying to solve the Rosalind.info RNAS problem (http://rosalind.info/problems/rnas/) for a couple of weeks now. The problem asks you to return the total number of valid folds (bonding graphs) given an RNA seq, using standard Watson-Crick pairing (A-U, C-G) + G-U, no loops smaller than 4bp, and the stipulation that there cannot be two dashed edges {sj,sk} and {sj′,sk′} where j<j′<k<k′ to prevent pseudoknots.

At first I thought I had been accurately counting all structures by brute force, which takes quite some time. So, since this is an exponentially increasing problem, I decided to memoize.

I'm very new to dynamic programming so, I can't be certain if I am messing up the memoization or the counting itself.

Given the input sequence AUGCUAGUACGGAGCGAGUCUAGCGAGCGAUGUCGUGAGUACUAUAUAUGCGCAUAAGCCACGU The returned count should be 284850219977421 according to the sample data provided. However the following code returns 1240732147303.

def memoize(count_struc):
   cache = {}

   def memoizedFunction(*args):
      if args not in cache:
         cache[args] = count_struc(*args)
      return cache[args]

   memoizedFunction.cache = cache
   return memoizedFunction

@memoize
def count_struc(seq):
    bonds = {'A':['U'],'C':['G'],'G':['C','U'],'U':['A','G']}
    i=0
    total = 0
    for x in seq:
        j = 0
        while j < len(seq):
            if j > i + 4:
                if seq[j] in bonds[seq[i]]:
                    total += count_struc(seq[i+1:j])
            j += 1
        i += 1
    return total + 1

I have a suspicion that I am counting incorrectly, but I've been working this over and I can't reason out the issue. Can anyone offer a suggestion?

python rna algorithm • 2.3k views
ADD COMMENT
1
Entering edit mode
11.4 years ago
Asaf 10k

I think that you don't count the situations in which you have two stem-loop structures. For instance, given a fold ((((....))))...((((....)))) you will count the first stem but you'll ignore the different structures of the rest of the sequence, when i and j capture the second stem you ignore the structures of the first stem. You should (I think) add a split option in which you call count_struct with each part of the sequence (split at each nucleotide) and multiply the returned values.

ADD COMMENT
0
Entering edit mode

I see what you mean. Right now it only appears to be counting individual structures and not entire folds. I tried a splitting type technique previously, but it was a horrible mess. I'll spend some more time on it and see if I can get it to work for me.

ADD REPLY

Login before adding your answer.

Traffic: 3222 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6