Question: extract subset of 21 nt. from csv file
gravatar for archana.bioinfo87
13 months ago by
archana.bioinfo87180 wrote:

Hi, I am trying to get 1-21nt, 2-22nt, 3-23nt etc till the end of sequences. For example, if I have sequences of 51 nt in length then I will get the 30 fragments with 21 nt in each fragment. I tried this script whcih looks to work fine but only for first 2 lines only.

fasta <- read.csv("trial.csv", header = TRUE, sep =",") 
dd1 <- as.character(fasta$Seq)
IDS <- as.character(fasta$Ids)
b <- numeric() 
final <- NULL
for (i in 1:length(dd1))
  for ( j in 1:nchar(dd1[i])) 
  a <-dd1[i]
  b[j] <- substr(a, j, j+20)
  #print (nchar(b[j]))
  #print (nchar(dd1[i]))
  #print (length(dd1[i]))
  ifelse ((nchar(b[j]))==21 && (j<=(nchar(dd1[i])-21)),{
  next}, break())
final <- paste(IDS[i],b, sep = ",") 
write.table(final, paste(i,"final.csv")) 

  Example : trial.csv will look like this 

Any help is much appreiciated.


R • 294 views
ADD COMMENTlink modified 13 months ago by JC12k • written 13 months ago by archana.bioinfo87180

If you are doing an assignment then by all means continue with your code.

If you just need to print all 21-mers from a file use from BBMap suite like this. in=test.fa k=21 out=stdout.fa (to print to screen)
ADD REPLYlink written 13 months ago by genomax92k

Thanks. I am going to try this as well.

ADD REPLYlink written 13 months ago by archana.bioinfo87180
gravatar for JC
13 months ago by
JC12k wrote:

I am not sure about the output you want, but here is how I would do in perl:


use strict;
use warnings;

my $len = 21;
while (<>) {
    next if (/^Ids/); # skip header
    my ($id, $seq) = split(/,/, $_);
    print "$id";
    for (my $i = 0; $i<=(length $seq)-$len; $i++) {
        my $ss = substr($seq, $i, $len);
        print ",$ss";
    print "\n";


$ perl < trial.csv > final.csv
$ head final.csv 
ADD COMMENTlink written 13 months ago by JC12k

Thanks for your time and effort. I really appreciate it.

ADD REPLYlink written 13 months ago by archana.bioinfo87180


I am getting extra space when gereratig the outfile from the perl script. I tried to mofiy it but space included in between in my sequences.

As, you know previously I was trying from 1st position to last-21 nt but now I am looking for 1st to last+21nt (from 1-21 start position)

Any help is much appreciated.

I tried this code but getting error`


    use strict;
use warnings;
#while(my $row = <>)
#   $row =~ 's/\s//g';
my $len = 21;
while (<>) {
    next if (/^Ids/); # skip header
    my ($id, $seq) = split(/,/, $_);
 my $ss = substr($seq, 0, 21);   
#print "$ss";
my $merge= ($seq.$ss);
#print "$merge";
$merge=~ 's/\s+$//g';
    for (my $i = 0; $i<=(length $merge)-$len; $i++) {
        my $st = substr($merge, $i, $len);
# $st= ~ 's/\s+$//g';
#if(length $st < $len) {next};
       print "$id,$st";
print "\n";
ADD REPLYlink modified 12 months ago • written 12 months ago by archana.bioinfo87180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1944 users visited in the last hour