Question: Expand a DNA sequence with IUPAC codes into multiple sequence in R
0
gravatar for english.server
19 months ago by
Germany
english.server140 wrote:

Hi!

I could not come with a code that can expand a sequence like seq= 'ARCS'. What would be the algorithm or (pseudo)code for the expansion in R?

Thanks in advance.

p.s.

Many useful python codes have been generously supplied, but I dont know python!

sequence R • 667 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by english.server140
2
gravatar for 5heikki
19 months ago by
5heikki8.1k
Finland
5heikki8.1k wrote:

This is super easy with Python, see e.g. here.

Copy pasted one option below:

from Bio import Seq
from itertools import product

def extend_ambiguous_dna(seq):
   """return list of all possible sequences given an ambiguous DNA input"""
   d = Seq.IUPAC.IUPACData.ambiguous_dna_values
   return [ list(map("".join, product(*map(d.get, seq)))) ]
ADD COMMENTlink written 19 months ago by 5heikki8.1k

It's really easy by using python, how do you explain the functions product and seq

ADD REPLYlink written 19 months ago by jmzeng131490

All this stuff is well documented, seq & itertools.

ADD REPLYlink modified 19 months ago • written 19 months ago by 5heikki8.1k

great, but both you and I made a mistake, he need the R solution

ADD REPLYlink written 19 months ago by jmzeng131490
1
gravatar for jmzeng1314
19 months ago by
jmzeng131490
jmzeng131490 wrote:

what a coincidence!

I've the code for this question,also there's a brief description for it if you can understand our Chinese.

http://www.biotrainee.com/thread-668-1-1.html

while(<DATA>){
chomp;
@F=split/:/;
$hash{$F[0]}=uc $F[1];
} ##这里记录简并碱基的对应关系
## %hash stored the tables;
sub primer2multiple{
$primer=$_[0];
$prod=1;
$primer_len=length $primer ;
foreach $i (0..$primer_len-1){
$char=substr($primer,$i,1);
#$prod*=length $hash{$char} if ($char !~/[ATCG]/) ;
if ($char !~/[ATCG]/) {
push @pos_list,$i;
push @char_list,$hash{$char};
##首先找出所有的不是ATCG的碱基位置以及它对应的碱基
## record all of the positions which are not ATCG;
}
}
@out_list=($primer);
##循环处理每个不是ATCG的碱基位置,让它们根据对应关系扩展
foreach my $i (0..scalar(@pos_list)-1){
@out_list=&new_out_list(\@out_list,$pos_list[$i],$char_list[$i]);
} ##&new_out_list 这个函数非常重要,会把数组不停的扩展,最终达到应该有的个数!
print join"\n",@out_list;
print "\n";
}
sub new_out_list{
my @array = @{$_[0]};
my $pos = $_[1];
my $char = $_[2];
my @new_array=();
foreach my $i (@array){
foreach my $j (0..length($char)-1){
substr($i,$pos,1,substr($char,$j,1));
push @new_array,$i;
}
}
return(@new_array);
}
primer2multiple('ATGCVCGCDCTNCCTGAB');
__DATA__
R:ag
Y:CT
M:AC
K:GT
S:gc
W:AT
H:atc
B:gtc
V:gac
D:GAT
N:ATgc
ADD COMMENTlink written 19 months ago by jmzeng131490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour