Question

Calculating Z-Scores And P-Scores Through Mfe For An Original And Shuffled Seq Of Sequences (Efficient Design Strategy))

0

Entering edit mode

12.1 years ago

Bioslayer • 0

Dear BioStarers, My inquiry is related to ncRNA prediction accuracy assessment, I have scanned bacterial genomes using a CM model and got some interesting hits, now I wanna see how these hits contrast from randomness - each sequence is shuffled such that its dinucleotide composition is conserved via Eddy's Squid option shuffle -d - into another file. It becomes interesting to see how far is the original hit predicted from a CM model in terms of a z-score and a p-value deviates from the background shuffled sequences when run against that model. For that I got to calculate the mean and standard deviation for the MFE of the random sequences.

I have in my quest for a way to do this reliably I identified RNAz, and the Vienna's package aliforldz.pl and a regression approach using LIBSVM as potential candidates to perform that but I am not conclusive as I am not exactly clear on how I should proceed with this bearing in mind that the criteria to obtain these scores is through the calculation of the (minimum free energy) MFE for the hit sequence and it's shuffled sequences (this will scale to millions and millions of sequences overall). Any suggestions for an approach that is computationally economic from someone who has gone there and done that will be highly appreciated..

RNAz: https://github.com/wash/rnaz. Vienna and Aliforldz.p http://www.tbi.univie.ac.at/RNA/, http://www.tbi.univie.ac.at/papers/SUPPLEMENTS/Alifoldz/alifoldz.html. SVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

• 2.7k views

ADD COMMENT • link updated 10.7 years ago by Biostar 20 • written 12.1 years ago by Bioslayer • 0