Retrieve Total Length Of Query Sequence From Text Blast Output Via Bioperl
1
0
Entering edit mode
12.1 years ago
Pukeko • 0

Hi,

first of all, I have to admit I am new to Bioperl and Perl in general, so please be patient if I am asking a stupid question.

I want to calculate how much of my query sequence (in %) are matched by the hits in my BLAST output. A formula would look like this: (length of the BLAST alignment) / (length of the query sequence)

The length of the query is given in the header of each BLAST result as "Length = .." .

Currently I am accessing the data in the BLAST output like this (the e-value and other attributes works fine):

#!/usr/bin/perl    
use strict;
use warnings;    
use Bio::SearchIO;

my $report_obj = new Bio::SearchIO(-format => 'blast',
                       -file   => 'test.blastx');
while (my $result = $report_obj -> next_result) {
  while (my $hit = $result -> next_hit) {
    # get the e-value
    my $evalue = $hit -> expect;
    print "\n" . 'e: ' . $evalue . "\n";
while ( my $hsp = $hit->next_hsp ) {
  # get the query length
  my $length_q = $hsp -> length('query');
  print 'query length: ' . $length_q . "\n";    
      }
   }
   }

Does $hsp also have an attribute like query length, if so: how do I access it?

Or is there another way of getting the information I want to extract?

I appreciate every help :)

bioperl blast • 6.4k views
ADD COMMENT
3
Entering edit mode
12.1 years ago
Neilfws 49k

All of the methods for Bio::SearchIO are summarised in the HOW-TO.

The length of the query sequence (as opposed to the length of the query in HSPs) is found by:

$result->query_length

What you are extracting using:

$hsp->length('query')

is the length of query sequence participating in the alignment, minus gaps.

"Length of the BLAST alignment" is a little imprecise. There may be > 1 HSP for each query/hit pair (in different frames, for example). You probably do not want to sum the length of all HSPs. This is why we typically deal with hits on a "per HSP" basis. So the closest thing to what you described: length of the BLAST alignment / length of the query sequence is probably:

$hsp->length('hit') / $result->query_length

assuming you do not want to count gaps as HSP length.

Anyway - have a good read of the HOW-TO and everything will be made clear.

ADD COMMENT
0
Entering edit mode

Thank you very much for this quick and thorough answer!

ADD REPLY

Login before adding your answer.

Traffic: 2245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6