To add some detail:
The di-nucleotide shuffling algorithm is referred to as the "Altschul-Erikson Shuffle". It is important because the MFE model that RNAz and many other tools use is based on dinucleotide stacking of RNA bases. So you can't just pick some MFE cutoff. You have to control for the dinucleotide distribution of the specific RNA sequence that you are assessing.
So in short, you would be calculating a p-value or a Z score or some such with respect to each sequence's control distribution. I don't know how many sequences you're trying to assess, but you could have a huge false positive problem on your hands, so you want to be careful with that.
Back in the day when I did this (and it's been a while, around the time that RNAz came out, see Uzilov 2005), I used Peter Clote's implementation of the AE Shuffle. Remarkably, the webserver is still up! And the Python scrips that I used are still available for download from that page.
You can see my paper, and the papers I cite, for the state of the field of this problem back in 2005, for both single-sequence and multi-sequence approaches. Unfortunately I've fallen out of the field, but last I checked, there are a lot of gotchas to finding whether something is "well-structured". Using multi-sequence data and looking for evidence of evolutionary conservation of structure, in my experience, is a much more robust way than doing single-sequence analysis. I don't know the state of the art in this field, but I would advise you to follow the work of Jacob Pedersen (of EvoFold), David Mathews (of Dynalign and other software), Sean Eddy, and other people they cite -- the list is too large. I also wouldn't get attached to just MFE-based methods.
This is a well-known jack-knife procedure to assess the relevance of calculated MFE. If Asaf can't find his code, I am pretty sure that ViennaRNA package has programs/scripts that will do it. I seem to remember that Sean Eddy's easel library had a shuffling program as well.
Thank you gentlemen.
As I am just learning about RNA secondary structure predictions, thought that i shall ask you please : shall I use MEA (maximum expected accuracy) to assess the predicted structures instead of MFE, and to divide the RNA structures into 2 categories : a) more structured, and b) less structured ?
many thanks !
Sorry, I didn't realize in my previous post that you gave some info on your study:
we have a set of 1000 RNA sequences that we compare and all of them have the same length (500 nt)
Are these human sequences? Do they have homologs if you BLAT or BLAST them against nearest related species? Do you know them to be transcribed? Do they belong to known genes, or are they intergenic transcripts of unknown function, or somewhere in between?
If you really can't incorporate related homologs, and you are forced to do single-sequence analysis, I guess you could just compute a Z score for each sequence by shuffling it using the Altschul-Erikson shuffle, and rank them by that... but depending on your study, there is a lot more that would have to be done to assign significance to any findings. Without knowing more about the study design, I'm not sure how to advise.
I'm also curious how you wound up with exactly 1000 sequences of all exactly the same length, unless what you're giving is an approximation?
To give you an idea of the difficulty of the problem, the FDR for screens for well-structured elements is 10% even in recent work by people who have been in this field for a LONG time, though what's published there is still an improvement in the FDR of one of my old studies. And those studies were using structure conservation evolutionary modeling, not single-sequence analysis. So this isn't a clean problem.