Calculating Z-Scores And P-Scores Through Mfe For An Original And Shuffled Seq Of Sequences (Efficient Design Strategy))
0
0
Entering edit mode
12.1 years ago
Bioslayer • 0

Dear BioStarers, My inquiry is related to ncRNA prediction accuracy assessment, I have scanned bacterial genomes using a CM model and got some interesting hits, now I wanna see how these hits contrast from randomness - each sequence is shuffled such that its dinucleotide composition is conserved via Eddy's Squid option shuffle -d - into another file. It becomes interesting to see how far is the original hit predicted from a CM model in terms of a z-score and a p-value deviates from the background shuffled sequences when run against that model. For that I got to calculate the mean and standard deviation for the MFE of the random sequences.

I have in my quest for a way to do this reliably I identified RNAz, and the Vienna's package aliforldz.pl and a regression approach using LIBSVM as potential candidates to perform that but I am not conclusive as I am not exactly clear on how I should proceed with this bearing in mind that the criteria to obtain these scores is through the calculation of the (minimum free energy) MFE for the hit sequence and it's shuffled sequences (this will scale to millions and millions of sequences overall). Any suggestions for an approach that is computationally economic from someone who has gone there and done that will be highly appreciated..

RNAz: https://github.com/wash/rnaz. Vienna and Aliforldz.p http://www.tbi.univie.ac.at/RNA/, http://www.tbi.univie.ac.at/papers/SUPPLEMENTS/Alifoldz/alifoldz.html. SVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

• 2.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6