Question: How To Find All Rna Stems In Genomic Data?
3
gravatar for None
10.2 years ago by
None30
None30 wrote:

I am searching for RNA stems of approx. 10 to 1000 bases. Is there a fast, BLAST-like tool for scanning DNA sequences of, for instance, 10kB?

As I want to extend this method to whole genome searches, exact algorithms are too much time consuming. I am only looking for sequence similarity. Energy values such as MFE should not taken into account. Unfortunately, BLAST lacks the ability to detect the uracil wobble pairs. Thus, very short stems are undetectable.

Is there any alignment tool I can use?

Thank you!

genome rna alignment search • 3.4k views
ADD COMMENTlink modified 9.4 years ago by Qdjm1.9k • written 10.2 years ago by None30
2
gravatar for Hanif Khalak
10.2 years ago by
Hanif Khalak1.2k
Doha, QA
Hanif Khalak1.2k wrote:

Something that might be relevant is a package which looks for bacterial transcriptional stop sites (short stem loops with certain characteristics) called TransTerm.

I co-wrote the the original version and went through a lot of iterations ranging from exact local matching of the DNA with its reverse complement to dynamic programming based alignment (which is what works best). It is tuned to predict sites with the specific bacterial terminator characteristics, but still may be worth a whirl.

You might also try WU-BLAST using a custom substitution matrix. Links can be found in this unrelated answer.

ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.2 years ago by Hanif Khalak1.2k
1
gravatar for Michael Dondrup
10.2 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

Not totally sure here: Stem-loop = inverted repeats sequence, right?

So, you could need a tool to find inverted repeats: aka.: Inverted Repeats Finder http://tandem.bu.edu/irf/irf.download.html

I haven't done this neither used the program, only refined the search terms, good luck.

ADD COMMENTlink written 10.2 years ago by Michael Dondrup48k
1
gravatar for Qdjm
10.2 years ago by
Qdjm1.9k
Toronto
Qdjm1.9k wrote:

Try FastA. There's an option to include your own scoring matrices described in this documentation page. I have not tried it myself but have heard of it being used for a similar task.

If you want the best performance, consider modeling base stacking interactions, not just single nucleotide pairings. You can do it by scoring dinucleotide pairings.

ADD COMMENTlink modified 10.2 years ago • written 10.2 years ago by Qdjm1.9k

I don't this is a case for an alignment tool, is it? afaik the op doesn't have a database with known 'stem' sequences to search for in a genome. A stem-loop is a RNA secondary structure that requires an inverted repeat sequence with the genome which afaik cannot be found by fasta.

ADD REPLYlink written 10.2 years ago by Michael Dondrup48k

@Michael Dondrup -- You can use alignment tools to find inverted repeats: BLAST short stretches of the genome against their reverse complements. If set your scoring matrix appropriately (and your gap penalty), then your BLAST score can be a good estimate of the free energy of the stem. There are better solutions, maybe IRF is one of them, but this is one way to do it.

ADD REPLYlink written 10.2 years ago by Qdjm1.9k

I didn't know that. Can you do this for a whole genome?

ADD REPLYlink written 10.2 years ago by Michael Dondrup48k

@Michael Dondrup -- I don't see why not. There's some messy scripting involved to break the genome up into overlapping chunks, run the separate BLAST/FastA processes and then compile the results. Plus some sanity checking to make sure that the two sides of the stem don't overlap. You'd also have to check both strands for stems because the G-U wobble makes the calculation non-symmetric. It's not pretty. The good news is that it's easy to parallelize.

ADD REPLYlink written 10.2 years ago by Qdjm1.9k
0
gravatar for Mary
10.2 years ago by
Mary11k
Boston MA area
Mary11k wrote:

I don't know if this would help, I haven't used it--but I was aware that on a gene details page at UCSC they provide a section called "mRNA Secondary Structure of 3' and 5' UTRs". So it's only selected sections of the sequences, but it offers various outputs for that data. It must have been a genome-wide survey (but does include free energy, which you don't want).

They say it relies on the Vienna RNA Package. On that page there are a number of different programs and strategies. I don't know if any would suit your needs. There's also a web server and it looks like other programs might be available there. That RNAz one has a "Genomic screen modus".

Maybe you know about all these and they aren't right. But figured I'd mention them.

Here is the TP53 details page I got this from.

ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.2 years ago by Mary11k

Generally Mfold or ViennaRNA would be the ones to use for RNA structure, but the OP does not want to use MFE (min. free energy)

ADD REPLYlink written 10.2 years ago by Hanif Khalak1.2k

Yeah, that would be why I wrote "which you don't want". But there are a number of other options over in those two links. I didn't know if all of them use that.

ADD REPLYlink written 10.2 years ago by Mary11k
0
gravatar for None
10.2 years ago by
None0
None0 wrote:

Thanks for all the replies.

@Michael Dondrup: Well, you are right, I am looking for inverted repeats - but not only exact hits, and additionally, allowing the GU equivalents.

@Hanif Khalak: As far as I understand, BLAST provides some pre-compiled matrices, but only for amino acids. blastn does not offer such an option.

PS: I am sorry, but without registration it seems impossible to answer directly to someones post.

ADD COMMENTlink written 10.2 years ago by None0

Did you try the IRF software then?

ADD REPLYlink written 10.2 years ago by Michael Dondrup48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 924 users visited in the last hour