Error Running Blast2Sam.Pl
5
3
Entering edit mode
12.5 years ago
Rm 8.3k

I am trting to convert blastn (from blast-2.2.25+) output to sam format using samtools "blast2sam.pl" script.

Any suggestions to rectify the error?

Error message:

Use of uninitialized value in subtraction (-) at ./blast2sam.pl line 58, <> line 422362.
Use of uninitialized value in subtraction (-) at ./blast2sam.pl line 58, <> line 422362.
@HWI-ST388-W7D:1:1101:1221:187650#0/1   0       gb|DQ079892.1|  0       255     0M      *       0       0       *       *       AS:i:147        EV:Z:5e-34

Line 422362 (pasted below) is the last line in the blastoutput file

Gap Penalties: Existence: 0, Extension: 2.5
blast blast sam perl • 6.0k views
ADD COMMENT
2
Entering edit mode
12.5 years ago
Nicolau ▴ 20

Hi RM,

 Try execution this script (http://www.lghm.ufpa.br/~nicolau/blast2sam.pl)! I'm modified some regular expression and I think it will work.

 Regards.

Nicolau

ADD COMMENT
0
Entering edit mode

@Nicolau thanks!!!

ADD REPLY
2
Entering edit mode
10.3 years ago
pruzanov ▴ 30

Turns out there were additional bugs, I am posting the script here, also sent a pull request to samtools group, the fix should gradually propagate into the package.


#!/usr/bin/perl -w

use strict;
use warnings;
use Getopt::Std;

my $dummy_score;

&blast2sam;

sub blast2sam {
  my %opts = ();
  getopts('sd', \%opts);
  die("Usage: blast2sam.pl <in.blastn>\n") if (-t STDIN && @ARGV == 0);
  my ($qlen, $slen, $q, $s, $qbeg, $qend, @sam, @cigar, @cmaux, $show_seq);
  $show_seq = defined($opts{s});
  $dummy_score = defined($opts{d});
  @sam = (); @sam[0,4,6..8,10] = ('', 255, '*', 0, 0, '*');
  while (<>) {
    if (@cigar && (/^Query=/ || /Score =.*bits.*Expect/ || /^>\S+/)) { # print
      &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
      @cigar = ();
    }
    if (/^Query=\s(\S+)/) {
      $sam[2] = undef; 
      $sam[0] = $1;
      #if ($sam[0] =~/\-$/ || $sam[0] =~/\/$/) { # Need to append next line
      #chop($sam[0]);
      my $next_line = <>;
      if ($next_line=~/^(\S+)$/) {
        $sam[0] .= $1;
      }

    } elsif (/(\S+)\s+total letters/) {
      $qlen = $1; $qlen =~ s/,//g;
    } elsif (/^>(\S+)/) {
      $sam[2] = $1;
    } elsif (/Length\s*=\s*(\d+)/) {
      $slen = $1;
    } elsif (/Score\s+=\s+(\S+) bits.+Expect(\(\d+\))?\s+=\s+(\S+)/) { # the start of an alignment block
      my ($as, $ev) = (int($1 + .499), $3);
      $ev = "1$ev" if ($ev =~ /^e/);
      @sam[1,3,9,11,12] = (0, 0, '', "AS:i:$as", "EV:Z:$ev");
      @cigar = (); $qbeg = 0;
      @cmaux = (0, 0, 0, '');
    } elsif (/Strand=(\S+)\/(\S+)/) {
      $sam[1] |= 0x10 if ($2 eq 'Minus');
    } elsif (/Query\s+(\d+)\s*(\S+)\s+(\d+)/) {
      $q = $2;
      unless ($qbeg) {
        $qbeg = $1;
        push(@cigar, ($1-1) . "H") if ($1 > 1);
      }
      $qend = $3;
      if ($show_seq) {
        my $x = $q;
        $x =~ s/-//g; $sam[9] .= $x;
      }
    } elsif (/Sbjct\:*\s+(\d+)\s*(\S+)\s+(\d+)/) {
     $s = $2;
      if ($sam[1] & 0x10) {
        $sam[3] = $3;
      } else {
        $sam[3] = $1 unless ($sam[3]);
      }
      &aln2cm(\@cigar, \$q, \$s, \@cmaux);
    }
  }
  if ($sam[2]) {
   &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend); # the last argument may be a problem
  }
}

sub blast_print_sam {
  my ($sam, $cigar, $cmaux, $qrest) = @_;
  push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
  #push(@$cigar, $qrest . 'H') if ($qrest);
  if ($sam->[1] & 0x10) {
    @$cigar = reverse(@$cigar);
    $sam->[9] = reverse($sam->[9]);
    $sam->[9] =~ tr/atgcrymkswATGCRYMKSW/tacgyrkmswTACGYRKMSW/;
  }
  if ($sam->[9]) {
    if ($dummy_score) {
      $sam->[10] = "";
      map {$sam->[10].="I"} (1..length($sam->[9]));
    }
  } else { 
    $sam->[9] = '*';
  }
  $sam->[5] = join('', @$cigar);
  print join("\t", @$sam), "\n";
}

sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 1; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 2; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}

=head2 SYNOPSIS

blast2sam.pl is a script for parsing output of NCBI's blastn output (default format) into sam format

=over

blast2sam.pl out.blast > out.blast.sam

=back

=head2 OPTIONS

The script has some (hopefully) useful options for tweaking the output sam

B<-s> Print out sequence of the query. 

Note that the current implementation prints out 
the sequence of aligned query which may be trimmed or otherwise 
different from the sequence of raw read in the input fastq. The CIGAR string
is also calculated for this query sequence, not the original read

B<-d> Dummy base quality score will be printed as field #11 in sam file. 

Blast output does not have base quality information for a read, so this option allows 
to have some fake value instead, may help when using sam file with some programs. Hardcoded
to be a string of 'I' that corresponds to Phred score 40 according to Sanger format

Using both options:

=over

blast2sam.pl -sd out.blast > out.blast.sam

=back

Note that there is no header generated, so you will need to run

=over

samtools -hT your_ref.fasta your_file.sam > your_file_with_header.sam

=back

=cut

ADD COMMENT
1
Entering edit mode
12.0 years ago

I would like to see your modified script, Nicolau, but I don't see it there- has it been moved?

ADD COMMENT
1
Entering edit mode
11.6 years ago
Rm 8.3k

I am adding "@Nicolau's" blast2sam.pl script, which I received then here:

#!/usr/bin/perl -w

use strict;
use warnings;
use Getopt::Std;

&blast2sam;

sub blast2sam {
  my %opts = ();
  getopts('s', \%opts);
  die("Usage: blast2sam.pl <in.blastn>\n") if (-t STDIN && @ARGV == 0);
  my ($qlen, $slen, $q, $s, $qbeg, $qend, @sam, @cigar, @cmaux, $show_seq);
  $show_seq = defined($opts{s});
  @sam = (); @sam[0,4,6..8,10] = ('', 255, '*', 0, 0, '*');
  while (<>) {
    if (@cigar && (/^Query=/ || /Score =.*bits.*Expect/)) { # print
      &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
      @cigar = ();
    }
    if (/^Query=\s(\S+)/) {
      $sam[0] = $1;
    } elsif (/(\S+)\s+total letters/) {
      $qlen = $1; $qlen =~ s/,//g;
    } elsif (/^>(\S+)/) {
      $sam[2] = $1;
    } elsif (/Length\s=\s(\d+)/) {
      $slen = $1;
    } elsif (/Score\s+=\s+(\S+) bits.+Expect(\(\d+\))?\s+=\s+(\S+)/) { # the start of an alignment block
      my ($as, $ev) = (int($1 + .499), $3);
      $ev = "1$ev" if ($ev =~ /^e/);
      @sam[1,3,9,11,12] = (0, 0, '', "AS:i:$as", "EV:Z:$ev");
      @cigar = (); $qbeg = 0;
      @cmaux = (0, 0, 0, '');
    } elsif (/Strand=(\S+)\/(\S+)/) {
      $sam[1] |= 0x10 if ($2 eq 'Minus');
    } elsif (/Query\s+(\d+)\s*(\S+)\s+(\d+)/) {
      $q = $2;
      unless ($qbeg) {
        $qbeg = $1;
        push(@cigar, ($1-1) . "H") if ($1 > 1);
      }
      $qend = $3;
      if ($show_seq) {
        my $x = $q;
        $x =~ s/-//g; $sam[9] .= $x;
      }
    } elsif (/Sbjct\:\s(\d+)\s*(\S+)\s(\d+)/) {
     $s = $2;
      if ($sam[1] & 0x10) {
        $sam[3] = $3;
      } else {
        $sam[3] = $1 unless ($sam[3]);
      }
      &aln2cm(\@cigar, \$q, \$s, \@cmaux);
    }
  }
  &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
}

sub blast_print_sam {
  my ($sam, $cigar, $cmaux, $qrest) = @_;
  push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
  push(@$cigar, $qrest . 'H') if ($qrest);
  if ($sam->[1] & 0x10) {
    @$cigar = reverse(@$cigar);
    $sam->[9] = reverse($sam->[9]);
    $sam->[9] =~ tr/atgcrymkswATGCRYMKSW/tacgyrkmswTACGYRKMSW/;
  }
  $sam->[9] = '*' if (!$sam->[9]);
  $sam->[5] = join('', @$cigar);
  print join("\t", @$sam), "\n";
}

sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 2; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 1; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}
ADD COMMENT
0
Entering edit mode

thanks for following up and posting the code.

ADD REPLY
1
Entering edit mode
10.3 years ago
pruzanov ▴ 30

I tried using the above script, but found that parsing blast output (blast-2.2.29+) created some inconsistencies in CIGAR string (read length was different according to CIGAR, so both samtools 0.1.19 and picard 1.72 complained. Turned out the script was reporting deletions instead insertions and vice versa. I am posting the fixed code:


sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 1; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 2; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}

ADD COMMENT

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6