Question: Rearranging Dna Sequences
1
gravatar for Jess
9.6 years ago by
Jess10
Jess10 wrote:

I'm having trouble rearranging a DNA sequence. I need to rearrange randomly a given DNA sequence so the G/C content remains the same and so does the A/T content and therefore the length. I can generate random sequences but I cannot rearrange a given sequence randomly.

Any help would be great thanks.

perl python dna R • 5.8k views
ADD COMMENTlink modified 9.6 years ago by Larry_Parnell16k • written 9.6 years ago by Jess10

homework ?.....

ADD REPLYlink written 9.6 years ago by Pierre Lindenbaum123k

Sounds like . . .

ADD REPLYlink written 9.6 years ago by Jarretinha3.3k

Duplicate of How To Scramble A Sequence Using An Existing Script Or A Python Method?

ADD REPLYlink modified 7 weeks ago by RamRS24k • written 8.4 years ago by Martin A Hansen3.0k
8
gravatar for Marcos De Carvalho
9.6 years ago by
Porto Alegre, RS, Brasil
Marcos De Carvalho310 wrote:

shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.

ADD COMMENTlink modified 4 months ago by RamRS24k • written 9.6 years ago by Marcos De Carvalho310
1

One can even use web-based shuffleseq

ADD REPLYlink modified 4 months ago by RamRS24k • written 9.6 years ago by Darked894.2k
1

I second the use of shuffleseq.

ADD REPLYlink written 9.6 years ago by Neilfws48k
4
gravatar for brentp
9.6 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

here's a function in python that "mutates" the original sequence, maintaining gc, at content.

import random

def seq_shuffler(original_seq="ACCAACXTGGGGTTTCCGGGGCCCCC"):
    original_seq = list(original_seq)
    while True:
        random.shuffle(original_seq)
        yield "".join(original_seq)

random_seq_gen = seq_shuffler()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()

# or loop.
for k in random_seq_gen:
    print k
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.6 years ago by brentp23k

Nice example. I haven't really done much with Python yet, but the various examples I've seen on this site have convinced me to take a look at it. For the things I don't do with R I tend to use Perl.

ADD REPLYlink written 9.6 years ago by Ian Simpson930

uShuffle can produce a Perl module, and a Python too. I generally add it to my bioperl/biopython stuff. And it is time and memory efficient.

ADD REPLYlink written 9.6 years ago by Jarretinha3.3k
4
gravatar for Ian Simpson
9.6 years ago by
Ian Simpson930
Edinburgh
Ian Simpson930 wrote:

OK I got a bit obsessed with doing this in R because I thought you could do it in one line, which you can !! (not including the input that is)

#input string of choice
a <- 'agcactacgactacgacagcata';

#shuffle it
paste(sample(unlist(strsplit(a,split=''))),collapse='');

and to do this 100 times and print out the answer:-

for(i in 1:100){
    print(paste(sample(unlist(strsplit(a,split=''))),collapse=''),q=F);
}
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.6 years ago by Ian Simpson930

not bad! though that's a large value of 1. ;) the python version could also be "1" line.

ADD REPLYlink written 9.6 years ago by brentp23k

five concatenated functions that's what R lives on !!! ;)

ADD REPLYlink written 9.6 years ago by Ian Simpson930
4
gravatar for Rob Syme
9.6 years ago by
Rob Syme540
Perth, Western Australia
Rob Syme540 wrote:

While the EMBOSS solution is probably the best, if it needs to be incorporated into a script, the Bioruby library gives you the very convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize         # ==> "tagccggcctxgatcactgcgcgccg"
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.6 years ago by Rob Syme540
3
gravatar for Chris Miller
9.6 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

What language are you using? Here's something in Ruby:

class Array
  #fisher-yates/knuth shuffle
  def shuffle!
    n = length
    for i in 0...n
      r = Kernel.rand(n-i)+i
      self[r], self[i] = self[i], self[r]
    end
    self
  end

  # Return a shuffled copy of the array
  def shuffle
    dup.shuffle!
  end
end

string = "AAATTTGGGCCC"
string.split(//).shuffle.join("")

> "AACGCTTTCAGG"

As always, there may be a more concise way to do this, but this will get the job done.

ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.6 years ago by Chris Miller21k

I think the EMBOSS shuffleseq is the better solution, but if you really want to use ruby, the bioruby library gives you a convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize    #=> "tagccggcctxgatcactgcgcgccg"
ADD REPLYlink modified 13 months ago by RamRS24k • written 9.6 years ago by Rob Syme540
2
gravatar for Jarretinha
9.6 years ago by
Jarretinha3.3k
São Paulo, Brazil
Jarretinha3.3k wrote:

Hi Jess,

You can use Sean Eddy's Squid lib from Sean Eddy to do this. It's will generate a set of command-line application able to shuffle you sequence in several ways. Additionally, you can use uShuffle which will do a similar job. Both can shuffle preserving the base counts and preserving n-base (dibase, tribase, etc.) counts too.

ADD COMMENTlink written 9.6 years ago by Jarretinha3.3k
2
gravatar for Panagiotis Alexiou
9.0 years ago by
Athens, Greece
Panagiotis Alexiou200 wrote:

also in perl without modules

my $seq = "AAAAAGTATACAACATCA"; #input seq
my @seqarray = split(//,$seq); #put seq in array
my @randarray = sort {rand() <=> rand()} @seqarray; #suffle indexes
my $outseq = join("",@randarray); #join shuffled sequence
print "$outseq\n"; #output

note that the perl sort function compares 2 numbers by <=> and returns -1, 0 or 1 depending which one is larger. if you sort by rand()<=>rand() then the sorting is random.

ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.0 years ago by Panagiotis Alexiou200
1
gravatar for Ian Simpson
9.6 years ago by
Ian Simpson930
Edinburgh
Ian Simpson930 wrote:

If you fancy doing it in Perl there are three different ways you can try listed here

ADD COMMENTlink modified 4 months ago by RamRS24k • written 9.6 years ago by Ian Simpson930
1
gravatar for Larry_Parnell
9.0 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

There are a whole set of web-based tools available for this at http://www.bioinformatics.org/sms2/ This would be fine for the one-off or small set of sequences or for one who does not run perl or have access to tools found at a large institution. Nonetheless, the code examples above are also a way to learn...

ADD COMMENTlink written 9.0 years ago by Larry_Parnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1907 users visited in the last hour