Rearranging Dna Sequences
9
1
Entering edit mode
13.0 years ago
Jess ▴ 10

I'm having trouble rearranging a DNA sequence. I need to rearrange randomly a given DNA sequence so the G/C content remains the same and so does the A/T content and therefore the length. I can generate random sequences but I cannot rearrange a given sequence randomly.

Any help would be great thanks.

python dna perl r • 7.5k views
0
Entering edit mode

homework ?.....

0
Entering edit mode

Sounds like . . .

0
Entering edit mode
8
Entering edit mode
13.0 years ago

shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.

1
Entering edit mode

One can even use web-based shuffleseq

1
Entering edit mode

I second the use of shuffleseq.

4
Entering edit mode
13.0 years ago
brentp 24k

here's a function in python that "mutates" the original sequence, maintaining gc, at content.

import random

def seq_shuffler(original_seq="ACCAACXTGGGGTTTCCGGGGCCCCC"):
original_seq = list(original_seq)
while True:
random.shuffle(original_seq)
yield "".join(original_seq)

random_seq_gen = seq_shuffler()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()

# or loop.
for k in random_seq_gen:
print k

0
Entering edit mode

Nice example. I haven't really done much with Python yet, but the various examples I've seen on this site have convinced me to take a look at it. For the things I don't do with R I tend to use Perl.

0
Entering edit mode

uShuffle can produce a Perl module, and a Python too. I generally add it to my bioperl/biopython stuff. And it is time and memory efficient.

4
Entering edit mode
13.0 years ago
Ian Simpson ▴ 960

OK I got a bit obsessed with doing this in R because I thought you could do it in one line, which you can !! (not including the input that is)

#input string of choice
a <- 'agcactacgactacgacagcata';

#shuffle it
paste(sample(unlist(strsplit(a,split=''))),collapse='');


and to do this 100 times and print out the answer:-

for(i in 1:100){
print(paste(sample(unlist(strsplit(a,split=''))),collapse=''),q=F);
}

0
Entering edit mode

not bad! though that's a large value of 1. ;) the python version could also be "1" line.

0
Entering edit mode

five concatenated functions that's what R lives on !!! ;)

4
Entering edit mode
13.0 years ago
Rob Syme ▴ 540

While the EMBOSS solution is probably the best, if it needs to be incorporated into a script, the Bioruby library gives you the very convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize         # ==> "tagccggcctxgatcactgcgcgccg"

3
Entering edit mode
13.0 years ago

What language are you using? Here's something in Ruby:

class Array
#fisher-yates/knuth shuffle
def shuffle!
n = length
for i in 0...n
r = Kernel.rand(n-i)+i
self[r], self[i] = self[i], self[r]
end
self
end

# Return a shuffled copy of the array
def shuffle
dup.shuffle!
end
end

string = "AAATTTGGGCCC"
string.split(//).shuffle.join("")

> "AACGCTTTCAGG"


As always, there may be a more concise way to do this, but this will get the job done.

0
Entering edit mode

I think the EMBOSS shuffleseq is the better solution, but if you really want to use ruby, the bioruby library gives you a convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize    #=> "tagccggcctxgatcactgcgcgccg"

2
Entering edit mode
13.0 years ago

Hi Jess,

You can use Sean Eddy's Squid lib from Sean Eddy to do this. It's will generate a set of command-line application able to shuffle you sequence in several ways. Additionally, you can use uShuffle which will do a similar job. Both can shuffle preserving the base counts and preserving n-base (dibase, tribase, etc.) counts too.

2
Entering edit mode
12.5 years ago

also in perl without modules

my $seq = "AAAAAGTATACAACATCA"; #input seq my @seqarray = split(//,$seq); #put seq in array
my @randarray = sort {rand() <=> rand()} @seqarray; #suffle indexes
my $outseq = join("",@randarray); #join shuffled sequence print "$outseq\n"; #output


note that the perl sort function compares 2 numbers by <=> and returns -1, 0 or 1 depending which one is larger. if you sort by rand()<=>rand() then the sorting is random.

1
Entering edit mode
13.0 years ago
Ian Simpson ▴ 960

If you fancy doing it in Perl there are three different ways you can try listed here

1
Entering edit mode
12.5 years ago

There are a whole set of web-based tools available for this at http://www.bioinformatics.org/sms2/ This would be fine for the one-off or small set of sequences or for one who does not run perl or have access to tools found at a large institution. Nonetheless, the code examples above are also a way to learn...