Question: interpretation of RNA secondary structure
2
gravatar for fareehakanwal90
3.6 years ago by
Pakistan
fareehakanwal9020 wrote:

Hello everyone.

I need a code that can give me chunks of secondary structure elements from a given RNA structure and its sequence. e.g. if I have a sequence and its structure like this:

sequence: UGCCCUAGUACGAGAGGACCGGAGUG

Structure: ...(((........))).........

I should be able to get the output like:

loop1: UGC

Loop2: AGUACGAG

loop3:GGAGUG

stem1: CCUAGG

the position of every dot corresponds to the position of it respective nucleotide. I have written this code but it gives me all the nucleotides which form loop as one string and all those forming stems as one string. I want fragments.

 

rna-seq • 1.7k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by fareehakanwal9020

So let me clarify:

UGC    CCU       AGUACGAG       AGG       ACCGGAGUG

...     (((        ........         )))       .........

Considering your example, you are saying that 

each . = nucleotide on loop

( and ) = nucleotide on sterm

Right?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Sam2.2k

its the output of RNAfold.

The sequence is aligned with structure. for example, for the dot at position one you have to check the character at corresponding position in string. I am saving them both as strings and their length is same.

Here is my code:

#str1 contains sequence and str2 contains structure in dot bracket format.

for k in range(len(str1)):
            for character in str2:
                if character=='.':
                    loop+=str1[k]
            elif str2[k]=='(' or str2[k]==')':
             stem+=str1[k]
        print "loop", loop
        print "stem", stem

ADD REPLYlink written 3.6 years ago by fareehakanwal9020

yes this is exactly what i am saying

ADD REPLYlink written 3.6 years ago by fareehakanwal9020
2
gravatar for Sam
3.6 years ago by
Sam2.2k
London
Sam2.2k wrote:

So, just for my own perl coding practice, I wrote the following code that will do what you want (without explicit debugging though...). This programme will take in one file, where the first line is the sequence and the second line is the structure, and generate the required output

 

#!/usr/bin/perl
use warnings;
use strict;
open FILE, "<", $ARGV[0] or die $!;
my $count = 0;
my $seq;
my  $structString;
while (<FILE>) {
    if($count == 0){
        $seq = ($_);
        $count = 1;
    }elsif($count==1){
        $structString = ($_);
        last;
    }
}
close FILE;
my @seqCharacter = split //, $seq;
my @structChar = split //, $structString;
my $loopCount=1;
my $stemCount=1;
my $currentLoop="";
my @stemStore;
my @stemName;
my $prevChar="";

for(my $i = 0; $i < @structChar; $i=$i+1){
    if($prevChar eq ""){ #The first character
        if($structChar[$i] eq "."){
            $currentLoop = $seqCharacter[$i];
            $prevChar = ".";
        }elsif($structChar[$i] eq "("){
            $stemStore[0]=$seqCharacter[$i];
            $stemName[0] = "stem".$stemCount;
            $stemCount=$stemCount+1;
            $prevChar="(";
        }else{
            print "Undefined input, structure must start with either ( or .\n";
            die;
        }
    }elsif($prevChar eq "." && $structChar[$i] eq "."){ #Continuation
        $currentLoop=$currentLoop.$seqCharacter[$i];
    }elsif($prevChar eq "." && $structChar[$i] eq "("){ #Encounter stem
        print "loop".$loopCount.": ".$currentLoop."\n";
        $loopCount = $loopCount+1;
        $currentLoop = "";
        $prevChar = "(";
        my $size = @stemStore;
        $stemStore[$size]=$seqCharacter[$i]; #we have a new stem
        $stemName[$size] = "stem".$stemCount;
        $stemCount = $stemCount+1;
    }elsif($prevChar eq "." && $structChar[$i] eq ")"){
        if(@stemStore == 0){
            print "Unmatched ) please check your input\n";
            die;
        }else{
            print "loop".$loopCount.": ".$currentLoop."\n";
            $loopCount = $loopCount+1;
            $currentLoop = "";

            $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i];    #Add this to the last loop so we can close it
            $prevChar = ")";
        }
    }elsif($prevChar eq "(" && $structChar[$i] eq "("){
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Continue
    }elsif($prevChar eq "(" && $structChar[$i] eq "."){
        $prevChar = ".";
        $currentLoop = $seqCharacter[$i];
        
    }elsif($prevChar eq "(" && $structChar[$i] eq ")"){
        $prevChar = ")";
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i];
    }elsif($prevChar eq ")" && $structChar[$i] eq ")"){
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Add to the last loop
    }elsif($prevChar eq ")" && $structChar[$i] eq "."){
        $prevChar = ".";
        print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
        pop(@stemName);
        pop(@stemStore);
    }elsif($prevChar eq ")" && $structChar[$i] eq "("){
        $prevChar = ")";
        print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
        pop(@stemName);
        pop(@stemStore);
    }
}
#output remaining
if($currentLoop ne ""){
    print "loop".$loopCount.": ".$currentLoop."\n";
}elsif(@stemName != 0 && $prevChar eq ")"){
    print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
    pop(@stemName);
    pop(@stemStore);
}else{
    print "There are unmatched stem remaining\n";
}

 

ADD COMMENTlink written 3.6 years ago by Sam2.2k
2
gravatar for Ryan Dale
3.6 years ago by
Ryan Dale4.7k
Bethesda, MD
Ryan Dale4.7k wrote:

forgi (github page) is a Python package for working with RNA secondary structures. I've used it in the past for tasks like this.

Here's an example. By the way, might want to check your expected output against the example sequence you provided.

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(
"""\
>demo
UGCCCUAGUACGAGAGGACCGGAGUG
...(((........))).........
""")
for d in bg.defines:
    print d, bg.get_define_seq_str(d)

Here's the output. Notice it keeps the two sides of the stem separate, and the 5' and 3' "stems" are listed separately. 

f1 ['UGC']
h0 ['AGUACGAG']
s0 ['CCU', 'AGG']
t1 ['ACCGGAGUG']

 

Edit: forna by the same author is useful for visualizing structures and is helpful for debugging and troubleshooting this sort of analysis.

Edit 2: In response to the comment asking about providing a string variable to from_fasta:

It's not documented, but after playing around it looks like this method requires 1) the record must start with ">" -- not even a newline is allowed, and 2) a trailing newline at the end of the structure line. So if you have your strings sequence and structure, you can build a string that from_fasta will accept like this:

text = '\n'.join(['> ', sequence, structure, '\n'])
bg.from_fasta(text)

or if you want to include a sequence ID:

text = '\n'.join(['\n', '> ' + seqid, sequence, structure, '\n'])
bg.from_fasta(text)
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Ryan Dale4.7k

Thanx this worked for me :)

 

ADD REPLYlink written 3.6 years ago by fareehakanwal9020

how can I pass string variables to the from_fasta function?

ADD REPLYlink written 3.6 years ago by fareehakanwal9020

I edited the answer to show this.

ADD REPLYlink written 3.6 years ago by Ryan Dale4.7k

thankyou very much

ADD REPLYlink written 3.5 years ago by fareehakanwal9020
0
gravatar for fareehakanwal90
3.6 years ago by
Pakistan
fareehakanwal9020 wrote:

can someone please tell me how can I pass variables to the from_fasta() function?

ADD COMMENTlink written 3.6 years ago by fareehakanwal9020

I never used that programme. But you can take a look here

So assuming you have a file called INPUT with the following format

> id ACCGGGG ((...))

You will do something like from_fasta(INPUT)

With multiple input, I guess the file should look like

> id1 ACCGGGG ((...))

> id2 UGCCCUAGUACGAGAGGACCGGAGUG ...(((........))).........

So according to Ryan's example, maybe you can do

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(INPUT)
for d in bg.defines:
    print d, bg.get_define_seq_str(d)

But I have never used this programme, so I am not sure

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Sam2.2k

it doesnt work, gives me the error:

bg.from_fasta(text, dissolve_length_one_stems=False)
  File "/usr/local/lib/python2.7/dist-packages/forgi/graph/bulge_graph.py", line 1272, in from_fasta
    self.from_dotbracket(lines[2].strip(), dissolve_length_one_stems)
IndexError: list index out of range

ADD REPLYlink written 3.5 years ago by fareehakanwal9020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1547 users visited in the last hour