Question: extract subset of 21 nt. from csv file
0
gravatar for archana.bioinfo87
3 months ago by
archana.bioinfo87180 wrote:

Hi, I am trying to get 1-21nt, 2-22nt, 3-23nt etc till the end of sequences. For example, if I have sequences of 51 nt in length then I will get the 30 fragments with 21 nt in each fragment. I tried this script whcih looks to work fine but only for first 2 lines only.

fasta <- read.csv("trial.csv", header = TRUE, sep =",") 
dd1 <- as.character(fasta$Seq)
IDS <- as.character(fasta$Ids)
b <- numeric() 
final <- NULL
for (i in 1:length(dd1))
{
  for ( j in 1:nchar(dd1[i])) 
  {
  a <-dd1[i]
  b[j] <- substr(a, j, j+20)
  #print (nchar(b[j]))
  #print (nchar(dd1[i]))
  #print (length(dd1[i]))
  ifelse ((nchar(b[j]))==21 && (j<=(nchar(dd1[i])-21)),{
  print(b[j]) 
  next}, break())
  } 
final <- paste(IDS[i],b, sep = ",") 
write.table(final, paste(i,"final.csv")) 
}



  Example : trial.csv will look like this 
    Ids,Seq
    hsa_circ_0000013,GGCTCCAGGGAGCTTGGCTTCTGTAGAAGTTCTAAGGAAGCGGTACGAACTCCACGGCGGTGGGGCGCTAACTAGCAGGGACCCCTGCAAGTGTTGGTCGGGGGCCTCGAGCTGCCTGAGCTGACACGAGGGGAGGGGTCTGTGTAGCCAACAG    hsa_circ_0000026,CAATCCCACAGAGTATTGATGAGGAAACTGAAGTTTGGAGCGATCACATCATTTTCCCAAGAATTCCTAGAGGACCTGTGCAACAACCTCTTGAGGATCGAATCTTCACTCCCGCTGTCTCAGCAGTCTACAGCACGGTAACACAAGTGGCAAGACAGCCGGGAACCCCTACCCCATCCCCTTATTCAGCACATGAAATAAACAAGGGGCATCCAAATCTTGCGGCAACGCCCCCGGGACATGCATCGTCCCCTGGACTCTCTCAA
hsa_circ_0000037,ATACATTTGGGCCTGTCTACCTGCCTTTGGGGCAATTTGCAGGTTTTGAGAAGTAGAAATGAGGGTCTGGAGAGGGCATCTGTGAGCCTCTTCTGGGAACCCCTCCCTTGTAGGT
hsa_circ_0000050,GATGAATTCAAAAGACTATTTGCTAAATATGGAGAACCAGGAGAAGTTTTTATCAACAAAGGCAAAGGATTCGGATTTATTAAGCTTGTGAGTGTAATTGTTTGATTTTACGTAGAATTAAAAAGGGTGGGGGATTTTTTTGTCACTACAAACGCTGAAGGCTTGGTTTTTAAACTGGGGAGGATAAATTGATCTTTTAGATTTTTCACCATTCTTACAGGAAAAATGCTTGCGGTATAATGCATAATTGTTGCTACCTAAGAGAAAAGAGGGGTGGGGTGGGTAAACTAAGTGGTGTTAGTGGGTGCTGCCTAAAGGTAATGGTCGAACTGAGCTGAAGGAAGAAAGGGA

Any help is much appreiciated.

Thanks

R • 179 views
ADD COMMENTlink modified 3 months ago by JC9.3k • written 3 months ago by archana.bioinfo87180
1

If you are doing an assignment then by all means continue with your code.

If you just need to print all 21-mers from a file use kmercountexact.sh from BBMap suite like this.

kmercountexact.sh in=test.fa k=21 out=stdout.fa (to print to screen)
ADD REPLYlink written 3 months ago by genomax76k

Thanks. I am going to try this as well.

ADD REPLYlink written 3 months ago by archana.bioinfo87180
1
gravatar for JC
3 months ago by
JC9.3k
Mexico
JC9.3k wrote:

I am not sure about the output you want, but here is how I would do in perl:

#!/usr/bin/perl

use strict;
use warnings;

my $len = 21;
while (<>) {
    next if (/^Ids/); # skip header
    chomp;
    my ($id, $seq) = split(/,/, $_);
    print "$id";
    for (my $i = 0; $i<=(length $seq)-$len; $i++) {
        my $ss = substr($seq, $i, $len);
        print ",$ss";
    }
    print "\n";
}

Example:

$ perl split21.pl < trial.csv > final.csv
$ head final.csv 
hsa_circ_0000013,GGCTCCAGGGAGCTTGGCTTC,GCTCCAGGGAGCTTGGCTTCT,CTCCAGGGAGCTTGGCTTCTG,
ADD COMMENTlink written 3 months ago by JC9.3k

Thanks for your time and effort. I really appreciate it.

ADD REPLYlink written 3 months ago by archana.bioinfo87180

Hi,

I am getting extra space when gereratig the outfile from the perl script. I tried to mofiy it but space included in between in my sequences.

As, you know previously I was trying from 1st position to last-21 nt but now I am looking for 1st to last+21nt (from 1-21 start position)

Any help is much appreciated.

I tried this code but getting error`

 #!/usr/bin/perl

    use strict;
use warnings;
#while(my $row = <>)
#{
#   $row =~ 's/\s//g';
#}
my $len = 21;
while (<>) {
    next if (/^Ids/); # skip header
    chomp;
    my ($id, $seq) = split(/,/, $_);
 my $ss = substr($seq, 0, 21);   
#print "$ss";
my $merge= ($seq.$ss);
#print "$merge";
$merge=~ 's/\s+$//g';
    for (my $i = 0; $i<=(length $merge)-$len; $i++) {
        my $st = substr($merge, $i, $len);
# $st= ~ 's/\s+$//g';
#if(length $st < $len) {next};
       print "$id,$st";
print "\n";
    }
  }
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by archana.bioinfo87180
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour