Question: Error Running Blast2Sam.Pl
3
gravatar for Rm
7.6 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

I am trting to convert blastn (from blast-2.2.25+) output to sam format using samtools "blast2sam.pl" script.

Any suggestions to rectify the error?

Error message:

Use of uninitialized value in subtraction (-) at ./blast2sam.pl line 58, <> line 422362.
Use of uninitialized value in subtraction (-) at ./blast2sam.pl line 58, <> line 422362.
@HWI-ST388-W7D:1:1101:1221:187650#0/1   0       gb|DQ079892.1|  0       255     0M      *       0       0       *       *       AS:i:147        EV:Z:5e-34

Line 422362 (pasted below) is the last line in the blastoutput file

Gap Penalties: Existence: 0, Extension: 2.5
perl sam blast • 4.3k views
ADD COMMENTlink modified 5.3 years ago by pruzanov30 • written 7.6 years ago by Rm7.8k
2
gravatar for Nicolau
7.6 years ago by
Nicolau20
Belém - Pará
Nicolau20 wrote:

Hi RM,

 Try execution this script (http://www.lghm.ufpa.br/~nicolau/blast2sam.pl)! I'm modified some regular expression and I think it will work.

 Regards.

Nicolau

ADD COMMENTlink written 7.6 years ago by Nicolau20

@Nicolau thanks!!!

ADD REPLYlink written 7.6 years ago by Rm7.8k
2
gravatar for pruzanov
5.3 years ago by
pruzanov30
pruzanov30 wrote:

Turns out there were additional bugs, I am posting the script here, also sent a pull request to samtools group, the fix should gradually propagate into the package.


#!/usr/bin/perl -w

use strict;
use warnings;
use Getopt::Std;

my $dummy_score;

&blast2sam;

sub blast2sam {
  my %opts = ();
  getopts('sd', \%opts);
  die("Usage: blast2sam.pl <in.blastn>\n") if (-t STDIN && @ARGV == 0);
  my ($qlen, $slen, $q, $s, $qbeg, $qend, @sam, @cigar, @cmaux, $show_seq);
  $show_seq = defined($opts{s});
  $dummy_score = defined($opts{d});
  @sam = (); @sam[0,4,6..8,10] = ('', 255, '*', 0, 0, '*');
  while (<>) {
    if (@cigar && (/^Query=/ || /Score =.*bits.*Expect/ || /^>\S+/)) { # print
      &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
      @cigar = ();
    }
    if (/^Query=\s(\S+)/) {
      $sam[2] = undef; 
      $sam[0] = $1;
      #if ($sam[0] =~/\-$/ || $sam[0] =~/\/$/) { # Need to append next line
      #chop($sam[0]);
      my $next_line = <>;
      if ($next_line=~/^(\S+)$/) {
        $sam[0] .= $1;
      }

    } elsif (/(\S+)\s+total letters/) {
      $qlen = $1; $qlen =~ s/,//g;
    } elsif (/^>(\S+)/) {
      $sam[2] = $1;
    } elsif (/Length\s*=\s*(\d+)/) {
      $slen = $1;
    } elsif (/Score\s+=\s+(\S+) bits.+Expect(\(\d+\))?\s+=\s+(\S+)/) { # the start of an alignment block
      my ($as, $ev) = (int($1 + .499), $3);
      $ev = "1$ev" if ($ev =~ /^e/);
      @sam[1,3,9,11,12] = (0, 0, '', "AS:i:$as", "EV:Z:$ev");
      @cigar = (); $qbeg = 0;
      @cmaux = (0, 0, 0, '');
    } elsif (/Strand=(\S+)\/(\S+)/) {
      $sam[1] |= 0x10 if ($2 eq 'Minus');
    } elsif (/Query\s+(\d+)\s*(\S+)\s+(\d+)/) {
      $q = $2;
      unless ($qbeg) {
        $qbeg = $1;
        push(@cigar, ($1-1) . "H") if ($1 > 1);
      }
      $qend = $3;
      if ($show_seq) {
        my $x = $q;
        $x =~ s/-//g; $sam[9] .= $x;
      }
    } elsif (/Sbjct\:*\s+(\d+)\s*(\S+)\s+(\d+)/) {
     $s = $2;
      if ($sam[1] & 0x10) {
        $sam[3] = $3;
      } else {
        $sam[3] = $1 unless ($sam[3]);
      }
      &aln2cm(\@cigar, \$q, \$s, \@cmaux);
    }
  }
  if ($sam[2]) {
   &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend); # the last argument may be a problem
  }
}

sub blast_print_sam {
  my ($sam, $cigar, $cmaux, $qrest) = @_;
  push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
  #push(@$cigar, $qrest . 'H') if ($qrest);
  if ($sam->[1] & 0x10) {
    @$cigar = reverse(@$cigar);
    $sam->[9] = reverse($sam->[9]);
    $sam->[9] =~ tr/atgcrymkswATGCRYMKSW/tacgyrkmswTACGYRKMSW/;
  }
  if ($sam->[9]) {
    if ($dummy_score) {
      $sam->[10] = "";
      map {$sam->[10].="I"} (1..length($sam->[9]));
    }
  } else { 
    $sam->[9] = '*';
  }
  $sam->[5] = join('', @$cigar);
  print join("\t", @$sam), "\n";
}

sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 1; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 2; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}

=head2 SYNOPSIS

blast2sam.pl is a script for parsing output of NCBI's blastn output (default format) into sam format

=over

blast2sam.pl out.blast > out.blast.sam

=back

=head2 OPTIONS

The script has some (hopefully) useful options for tweaking the output sam

B<-s> Print out sequence of the query. 

Note that the current implementation prints out 
the sequence of aligned query which may be trimmed or otherwise 
different from the sequence of raw read in the input fastq. The CIGAR string
is also calculated for this query sequence, not the original read

B<-d> Dummy base quality score will be printed as field #11 in sam file. 

Blast output does not have base quality information for a read, so this option allows 
to have some fake value instead, may help when using sam file with some programs. Hardcoded
to be a string of 'I' that corresponds to Phred score 40 according to Sanger format

Using both options:

=over

blast2sam.pl -sd out.blast > out.blast.sam

=back

Note that there is no header generated, so you will need to run

=over

samtools -hT your_ref.fasta your_file.sam > your_file_with_header.sam

=back

=cut

ADD COMMENTlink written 5.3 years ago by pruzanov30
1
gravatar for marty.gollery
7.1 years ago by
marty.gollery10 wrote:

I would like to see your modified script, Nicolau, but I don't see it there- has it been moved?

ADD COMMENTlink written 7.1 years ago by marty.gollery10
1
gravatar for Rm
6.7 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

I am adding "@Nicolau's" blast2sam.pl script, which I received then here:

#!/usr/bin/perl -w

use strict;
use warnings;
use Getopt::Std;

&blast2sam;

sub blast2sam {
  my %opts = ();
  getopts('s', \%opts);
  die("Usage: blast2sam.pl <in.blastn>\n") if (-t STDIN && @ARGV == 0);
  my ($qlen, $slen, $q, $s, $qbeg, $qend, @sam, @cigar, @cmaux, $show_seq);
  $show_seq = defined($opts{s});
  @sam = (); @sam[0,4,6..8,10] = ('', 255, '*', 0, 0, '*');
  while (<>) {
    if (@cigar && (/^Query=/ || /Score =.*bits.*Expect/)) { # print
      &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
      @cigar = ();
    }
    if (/^Query=\s(\S+)/) {
      $sam[0] = $1;
    } elsif (/(\S+)\s+total letters/) {
      $qlen = $1; $qlen =~ s/,//g;
    } elsif (/^>(\S+)/) {
      $sam[2] = $1;
    } elsif (/Length\s=\s(\d+)/) {
      $slen = $1;
    } elsif (/Score\s+=\s+(\S+) bits.+Expect(\(\d+\))?\s+=\s+(\S+)/) { # the start of an alignment block
      my ($as, $ev) = (int($1 + .499), $3);
      $ev = "1$ev" if ($ev =~ /^e/);
      @sam[1,3,9,11,12] = (0, 0, '', "AS:i:$as", "EV:Z:$ev");
      @cigar = (); $qbeg = 0;
      @cmaux = (0, 0, 0, '');
    } elsif (/Strand=(\S+)\/(\S+)/) {
      $sam[1] |= 0x10 if ($2 eq 'Minus');
    } elsif (/Query\s+(\d+)\s*(\S+)\s+(\d+)/) {
      $q = $2;
      unless ($qbeg) {
        $qbeg = $1;
        push(@cigar, ($1-1) . "H") if ($1 > 1);
      }
      $qend = $3;
      if ($show_seq) {
        my $x = $q;
        $x =~ s/-//g; $sam[9] .= $x;
      }
    } elsif (/Sbjct\:\s(\d+)\s*(\S+)\s(\d+)/) {
     $s = $2;
      if ($sam[1] & 0x10) {
        $sam[3] = $3;
      } else {
        $sam[3] = $1 unless ($sam[3]);
      }
      &aln2cm(\@cigar, \$q, \$s, \@cmaux);
    }
  }
  &blast_print_sam(\@sam, \@cigar, \@cmaux, $qlen - $qend);
}

sub blast_print_sam {
  my ($sam, $cigar, $cmaux, $qrest) = @_;
  push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
  push(@$cigar, $qrest . 'H') if ($qrest);
  if ($sam->[1] & 0x10) {
    @$cigar = reverse(@$cigar);
    $sam->[9] = reverse($sam->[9]);
    $sam->[9] =~ tr/atgcrymkswATGCRYMKSW/tacgyrkmswTACGYRKMSW/;
  }
  $sam->[9] = '*' if (!$sam->[9]);
  $sam->[5] = join('', @$cigar);
  print join("\t", @$sam), "\n";
}

sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 2; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 1; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}
ADD COMMENTlink written 6.7 years ago by Rm7.8k

thanks for following up and posting the code.

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 80k
1
gravatar for pruzanov
5.4 years ago by
pruzanov30
pruzanov30 wrote:

I tried using the above script, but found that parsing blast output (blast-2.2.29+) created some inconsistencies in CIGAR string (read length was different according to CIGAR, so both samtools 0.1.19 and picard 1.72 complained. Turned out the script was reporting deletions instead insertions and vice versa. I am posting the fixed code:


sub aln2cm {
  my ($cigar, $q, $s, $cmaux) = @_;
  my $l = length($$q);
  for (my $i = 0; $i < $l; ++$i) {
    my $op;
    # set $op
    if (substr($$q, $i, 1) eq '-') { $op = 1; }
    elsif (substr($$s, $i, 1) eq '-') { $op = 2; }
    else { $op = 0; }
    # for CIGAR
    if ($cmaux->[0] == $op) {
      ++$cmaux->[1];
    } else {
      push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));
      $cmaux->[0] = $op; $cmaux->[1] = 1;
    }
  }
}

ADD COMMENTlink written 5.4 years ago by pruzanov30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour