Question: Need Script For: Randomization Test For Predicted Mirnas
0
gravatar for biolab
7.1 years ago by
biolab1.2k
biolab1.2k wrote:

Hi everyone,

I have a file with many predicted miRNAs. I need to perform a randomization test to identify which of these miRNAs are highly probable. This test is to randomize each predicted miRNA 1000 times and calculate each randomized sequence's MFE value (this can be easily done by RNAfold). My current problem is how to generate 1000 radomized sequences for each miRNA? I make an example below.

Predicted miRNAs file

>miR1
augcgugaccguaugcuac
>miR2
uuuggugcguagucguacg
   ............
>miR100
auaugagucguacguacgu

Radomized sequences file

>miR1_1
ugcggaccguaugcuacau
>miR1_2
ugcggaccuugcuacauga
...........
>miR1_1000
ugcggaccguaugcuacua
.............
............
>miR100_1000
auaugagucgacguacu

Could anyone being familar with perl help to solve this problem? For the next steps including calculating MFE etc, I can do them myself. But I believe someone who know well RNAfold and miRNA prediction can produce a pipline for this work. I attached a Nucleic Acid Research reference link here http://nar.oxfordjournals.org/content/37/suppl_1/D111.full . In the method section, when searing RNAFOLD, you can find the authors' method. THANK YOU in advance!

script • 2.6k views
ADD COMMENTlink modified 7.1 years ago by JC12k • written 7.1 years ago by biolab1.2k
3

Instead of looking for a script to generate random sequences, may be you can use "off the shelf" tools to generate random sequences from your input sequence. Take a look at biosquid package and especially shuffle. (these are Debian/Ubuntu packages and I am not sure if you can find an alternative for other distro or windows and I've not tried it myself)

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Sudeep1.6k
2

Yes; I'd use EMBOSS shuffleseq, which can easily be run from a (Bio)Perl (or other) script if required.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Neilfws49k
3
gravatar for Nicolas Rosewick
7.1 years ago by
Belgium, Brussels
Nicolas Rosewick9.3k wrote:

Just a quick advice. You should use the pre-miRNA sequence (the typical hairpin-like structure) and not the mature miRNA to fold the secondary structure. You can use Randfold ( http://bioinformatics.oxfordjournals.org/content/20/17/2911 ) to do that.

download Randfold : http://bioinformatics.psb.ugent.be/supplementary_data/erbon/nov2003/

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Nicolas Rosewick9.3k

THANKS a lot for your information about the Randfold.

ADD REPLYlink written 7.1 years ago by biolab1.2k
2
gravatar for JC
7.1 years ago by
JC12k
Mexico
JC12k wrote:
#!/usr/bin/perl

use strict;
use warnings;
use List::Util 'shuffle';

my $rep = 1000; # permutation per mirna

$/ = "\n>";
while (<>) {
    s/>//g;
    my ($id, $seq) = split (/\n/, $_);
    my @seq  = split (//, $seq);
    my %seen = ();
    for (my $n=1; $n<=$rep; $n++) {
        my @rand_seq = shuffle(@seq);
        my $new_seq  = join "", @rand_seq;
        next if (defined $seen{$new_seq}); # skip sequences already generated
        next if ($new_seq eq $seq); # skip if both are the same miRNA
        print ">$id\_$n\n$new_seq\n";
        $seen{$new_seq} = 1;
    }
}

Save as any perl script and run as: perl randMiRNA.pl < mirna.fasta > random_mirnas.fasta

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by JC12k
1

might be a more realistic null model to get the random sequence from a random spot in the genome--though it's not clear what the OP intends

ADD REPLYlink written 7.1 years ago by brentp23k
1

I agree, also I can imagine a dimer permutation in miRNAs sites.

ADD REPLYlink written 7.1 years ago by JC12k

THANKS a lot for your script. brentp's suggestion is good, but I don't think to test a random sequence from intergenic and intron regions, because I am testing the second structure of already predicted miRNAs. From previous publications, they didn't test random genomic sequences. Anyway, your discussions are very good. JC, I have one more question: in your code you use

List::Util 'shuffle';

Would you please explain briefly about shuffle? Do i need to install biosquid? THANK YOU VERY MUCH!

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by biolab1.2k
1

List::Util is a core module in Perl, you don't need to install it. The "shuffle" function returns a list (array) in random order, I'm guessing it uses Fisher-Yates permutation but you can check the main algorithm in the source code: http://search.cpan.org/~pevans/Scalar-List-Utils-1.35/lib/List/Util.pm#shuffle_LIST

ADD REPLYlink written 7.1 years ago by JC12k

Hi JC, thank you very much for your explaination. Best regards!

ADD REPLYlink written 7.1 years ago by biolab1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2559 users visited in the last hour
_