Question

Need Script For: Randomization Test For Predicted Mirnas

0

Entering edit mode

10.4 years ago

biolab ★ 1.4k

Hi everyone,

I have a file with many predicted miRNAs. I need to perform a randomization test to identify which of these miRNAs are highly probable. This test is to randomize each predicted miRNA 1000 times and calculate each randomized sequence's MFE value (this can be easily done by RNAfold). My current problem is how to generate 1000 radomized sequences for each miRNA? I make an example below.

Predicted miRNAs file

>miR1
augcgugaccguaugcuac
>miR2
uuuggugcguagucguacg
   ............
>miR100
auaugagucguacguacgu

Radomized sequences file

>miR1_1
ugcggaccguaugcuacau
>miR1_2
ugcggaccuugcuacauga
...........
>miR1_1000
ugcggaccguaugcuacua
.............
............
>miR100_1000
auaugagucgacguacu

Could anyone being familar with perl help to solve this problem? For the next steps including calculating MFE etc, I can do them myself. But I believe someone who know well RNAfold and miRNA prediction can produce a pipline for this work. I attached a Nucleic Acid Research reference link here http://nar.oxfordjournals.org/content/37/suppl_1/D111.full . In the method section, when searing RNAFOLD, you can find the authors' method. THANK YOU in advance!

script • 3.4k views

ADD COMMENT • link updated 10.4 years ago by JC 13k • written 10.4 years ago by biolab ★ 1.4k

3

Entering edit mode

Instead of looking for a script to generate random sequences, may be you can use "off the shelf" tools to generate random sequences from your input sequence. Take a look at biosquid package and especially shuffle. (these are Debian/Ubuntu packages and I am not sure if you can find an alternative for other distro or windows and I've not tried it myself)

ADD REPLY • link 10.4 years ago by Sudeep ★ 1.7k

2

Entering edit mode

Yes; I'd use EMBOSS shuffleseq, which can easily be run from a (Bio)Perl (or other) script if required.

ADD REPLY • link 10.4 years ago by Neilfws 49k

score 3 · Answer 1 · 2013-12-02

3

Entering edit mode

10.4 years ago

Nicolas Rosewick 10k

Just a quick advice. You should use the pre-miRNA sequence (the typical hairpin-like structure) and not the mature miRNA to fold the secondary structure. You can use Randfold ( http://bioinformatics.oxfordjournals.org/content/20/17/2911 ) to do that.

download Randfold : http://bioinformatics.psb.ugent.be/supplementary_data/erbon/nov2003/

ADD COMMENT • link 10.4 years ago by Nicolas Rosewick 10k

0

Entering edit mode

THANKS a lot for your information about the Randfold.

ADD REPLY • link 10.4 years ago by biolab ★ 1.4k

score 2 · Answer 2 · 2013-12-02

2

Entering edit mode

10.4 years ago

JC 13k

#!/usr/bin/perl

use strict;
use warnings;
use List::Util 'shuffle';

my $rep = 1000; # permutation per mirna

$/ = "\n>";
while (<>) {
    s/>//g;
    my ($id, $seq) = split (/\n/, $_);
    my @seq  = split (//, $seq);
    my %seen = ();
    for (my $n=1; $n<=$rep; $n++) {
        my @rand_seq = shuffle(@seq);
        my $new_seq  = join "", @rand_seq;
        next if (defined $seen{$new_seq}); # skip sequences already generated
        next if ($new_seq eq $seq); # skip if both are the same miRNA
        print ">$id\_$n\n$new_seq\n";
        $seen{$new_seq} = 1;
    }
}

Save as any perl script and run as: perl randMiRNA.pl < mirna.fasta > random_mirnas.fasta

ADD COMMENT • link 10.4 years ago by JC 13k

1

Entering edit mode

might be a more realistic null model to get the random sequence from a random spot in the genome--though it's not clear what the OP intends

ADD REPLY • link 10.4 years ago by brentp 24k

1

Entering edit mode

I agree, also I can imagine a dimer permutation in miRNAs sites.

ADD REPLY • link 10.4 years ago by JC 13k

0

Entering edit mode

THANKS a lot for your script. brentp's suggestion is good, but I don't think to test a random sequence from intergenic and intron regions, because I am testing the second structure of already predicted miRNAs. From previous publications, they didn't test random genomic sequences. Anyway, your discussions are very good. JC, I have one more question: in your code you use

List::Util 'shuffle';

Would you please explain briefly about shuffle? Do i need to install biosquid? THANK YOU VERY MUCH!

ADD REPLY • link 10.4 years ago by biolab ★ 1.4k

1

Entering edit mode

List::Util is a core module in Perl, you don't need to install it. The "shuffle" function returns a list (array) in random order, I'm guessing it uses Fisher-Yates permutation but you can check the main algorithm in the source code: http://search.cpan.org/~pevans/Scalar-List-Utils-1.35/lib/List/Util.pm#shuffle_LIST