I have been trying to solve the Rosalind.info RNAS problem (http://rosalind.info/problems/rnas/) for a couple of weeks now. The problem asks you to return the total number of valid folds (bonding graphs) given an RNA seq, using standard Watson-Crick pairing (A-U, C-G) + G-U, no loops smaller than 4bp, and the stipulation that there cannot be two dashed edges {sj,sk}
and {sj′,sk′}
where j<j′<k<k′
to prevent pseudoknots.
At first I thought I had been accurately counting all structures by brute force, which takes quite some time. So, since this is an exponentially increasing problem, I decided to memoize.
I'm very new to dynamic programming so, I can't be certain if I am messing up the memoization or the counting itself.
Given the input sequence AUGCUAGUACGGAGCGAGUCUAGCGAGCGAUGUCGUGAGUACUAUAUAUGCGCAUAAGCCACGU
The returned count should be 284850219977421
according to the sample data provided. However the following code returns 1240732147303
.
def memoize(count_struc):
cache = {}
def memoizedFunction(*args):
if args not in cache:
cache[args] = count_struc(*args)
return cache[args]
memoizedFunction.cache = cache
return memoizedFunction
@memoize
def count_struc(seq):
bonds = {'A':['U'],'C':['G'],'G':['C','U'],'U':['A','G']}
i=0
total = 0
for x in seq:
j = 0
while j < len(seq):
if j > i + 4:
if seq[j] in bonds[seq[i]]:
total += count_struc(seq[i+1:j])
j += 1
i += 1
return total + 1
I have a suspicion that I am counting incorrectly, but I've been working this over and I can't reason out the issue. Can anyone offer a suggestion?
I see what you mean. Right now it only appears to be counting individual structures and not entire folds. I tried a splitting type technique previously, but it was a horrible mess. I'll spend some more time on it and see if I can get it to work for me.