interpretation of RNA secondary structure
3
2
Entering edit mode
6.7 years ago

Hello everyone.

I need a code that can give me chunks of secondary structure elements from a given RNA structure and its sequence. e.g. if I have a sequence and its structure like this:

sequence: UGCCCUAGUACGAGAGGACCGGAGUG

## Structure: ...(((........))).........

I should be able to get the output like:

loop1: UGC

Loop2: AGUACGAG

loop3:GGAGUG

stem1: CCUAGG

the position of every dot corresponds to the position of it respective nucleotide. I have written this code but it gives me all the nucleotides which form loop as one string and all those forming stems as one string. I want fragments.

RNA-Seq • 3.1k views
0
Entering edit mode

So let me clarify:

UGC    CCU       AGUACGAG       AGG       ACCGGAGUG

## ...     (((        ........         )))       .........

Considering your example, you are saying that

each . = nucleotide on loop

( and ) = nucleotide on sterm

Right?

0
Entering edit mode

its the output of RNAfold.

The sequence is aligned with structure. for example, for the dot at position one you have to check the character at corresponding position in string. I am saving them both as strings and their length is same.

Here is my code:

#str1 contains sequence and str2 contains structure in dot bracket format.

for k in range(len(str1)):
for character in str2:
if character=='.':
loop+=str1[k]
elif str2[k]=='(' or str2[k]==')':
stem+=str1[k]
print "loop", loop
print "stem", stem

0
Entering edit mode

yes this is exactly what i am saying

2
Entering edit mode
6.7 years ago
Sam ★ 4.0k

So, just for my own perl coding practice, I wrote the following code that will do what you want (without explicit debugging though...). This programme will take in one file, where the first line is the sequence and the second line is the structure, and generate the required output

#!/usr/bin/perl
use warnings;
use strict;
open FILE, "<", $ARGV[0] or die$!;
my $count = 0; my$seq;
my  $structString; while (<FILE>) { if($count == 0){
$seq = ($_);
$count = 1; }elsif($count==1){
$structString = ($_);
last;
}
}
close FILE;
my @seqCharacter = split //, $seq; my @structChar = split //,$structString;
my $loopCount=1; my$stemCount=1;
my $currentLoop=""; my @stemStore; my @stemName; my$prevChar="";

for(my $i = 0;$i < @structChar; $i=$i+1){
if($prevChar eq ""){ #The first character if($structChar[$i] eq "."){$currentLoop = $seqCharacter[$i];
$prevChar = "."; }elsif($structChar[$i] eq "("){$stemStore[0]=$seqCharacter[$i];
$stemName[0] = "stem".$stemCount;
$stemCount=$stemCount+1;
$prevChar="("; }else{ print "Undefined input, structure must start with either ( or .\n"; die; } }elsif($prevChar eq "." && $structChar[$i] eq "."){ #Continuation
$currentLoop=$currentLoop.$seqCharacter[$i];
}elsif($prevChar eq "." &&$structChar[$i] eq "("){ #Encounter stem print "loop".$loopCount.": ".$currentLoop."\n";$loopCount = $loopCount+1;$currentLoop = "";
$prevChar = "("; my$size = @stemStore;
$stemStore[$size]=$seqCharacter[$i]; #we have a new stem
$stemName[$size] = "stem".$stemCount;$stemCount = $stemCount+1; }elsif($prevChar eq "." && $structChar[$i] eq ")"){
if(@stemStore == 0){
die;
}else{
print "loop".$loopCount.": ".$currentLoop."\n";
$loopCount =$loopCount+1;
$currentLoop = "";$stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Add this to the last loop so we can close it$prevChar = ")";
}
}elsif($prevChar eq "(" &&$structChar[$i] eq "("){$stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Continue }elsif($prevChar eq "(" && $structChar[$i] eq "."){
$prevChar = ".";$currentLoop = $seqCharacter[$i];

}elsif($prevChar eq "(" &&$structChar[$i] eq ")"){$prevChar = ")";
$stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i];
}elsif($prevChar eq ")" &&$structChar[$i] eq ")"){$stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Add to the last loop }elsif($prevChar eq ")" && $structChar[$i] eq "."){
$prevChar = "."; print$stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n"; pop(@stemName); pop(@stemStore); }elsif($prevChar eq ")" && $structChar[$i] eq "("){
$prevChar = ")"; print$stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n"; pop(@stemName); pop(@stemStore); } } #output remaining if($currentLoop ne ""){
print "loop".$loopCount.": ".$currentLoop."\n";
}elsif(@stemName != 0 && $prevChar eq ")"){ print$stemName[@stemName-1].": ".\$stemStore[@stemStore-1]."\n";
pop(@stemName);
pop(@stemStore);
}else{
print "There are unmatched stem remaining\n";
}

2
Entering edit mode
6.7 years ago
Ryan Dale 5.0k

forgi (github page) is a Python package for working with RNA secondary structures. I've used it in the past for tasks like this.

Here's an example. By the way, might want to check your expected output against the example sequence you provided.

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(
"""\
>demo
UGCCCUAGUACGAGAGGACCGGAGUG
...(((........))).........
""")
for d in bg.defines:
print d, bg.get_define_seq_str(d)

Here's the output. Notice it keeps the two sides of the stem separate, and the 5' and 3' "stems" are listed separately.

f1 ['UGC']
h0 ['AGUACGAG']
s0 ['CCU', 'AGG']
t1 ['ACCGGAGUG']

Edit: forna by the same author is useful for visualizing structures and is helpful for debugging and troubleshooting this sort of analysis.

Edit 2: In response to the comment asking about providing a string variable to from_fasta:

It's not documented, but after playing around it looks like this method requires 1) the record must start with ">" -- not even a newline is allowed, and 2) a trailing newline at the end of the structure line. So if you have your strings sequence and structure, you can build a string that from_fasta will accept like this:

text = '\n'.join(['> ', sequence, structure, '\n'])
bg.from_fasta(text)

or if you want to include a sequence ID:

text = '\n'.join(['\n', '> ' + seqid, sequence, structure, '\n'])
bg.from_fasta(text)
0
Entering edit mode

Thanx this worked for me :)

0
Entering edit mode

how can I pass string variables to the from_fasta function?

0
Entering edit mode

I edited the answer to show this.

0
Entering edit mode

thankyou very much

0
Entering edit mode
6.7 years ago

can someone please tell me how can I pass variables to the from_fasta() function?

0
Entering edit mode

I never used that programme. But you can take a look here

So assuming you have a file called INPUT with the following format

> id ACCGGGG ((...))

You will do something like from_fasta(INPUT)

With multiple input, I guess the file should look like

> id1 ACCGGGG ((...))

> id2 UGCCCUAGUACGAGAGGACCGGAGUG ...(((........))).........

So according to Ryan's example, maybe you can do

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(INPUT)
for d in bg.defines:
print d, bg.get_define_seq_str(d)

But I have never used this programme, so I am not sure

0
Entering edit mode

it doesnt work, gives me the error:

bg.from_fasta(text, dissolve_length_one_stems=False)
File "/usr/local/lib/python2.7/dist-packages/forgi/graph/bulge_graph.py", line 1272, in from_fasta
self.from_dotbracket(lines[2].strip(), dissolve_length_one_stems)
IndexError: list index out of range