Question: Retrieve Total Length Of Query Sequence From Text Blast Output Via Bioperl
0
gravatar for Pukeko
6.2 years ago by
Pukeko0
Pukeko0 wrote:

Hi,

first of all, I have to admit I am new to Bioperl and Perl in general, so please be patient if I am asking a stupid question.

I want to calculate how much of my query sequence (in %) are matched by the hits in my BLAST output. A formula would look like this: (length of the BLAST alignment) / (length of the query sequence)

The length of the query is given in the header of each BLAST result as "Length = .." .

Currently I am accessing the data in the BLAST output like this (the e-value and other attributes works fine):

#!/usr/bin/perl    
use strict;
use warnings;    
use Bio::SearchIO;

my $report_obj = new Bio::SearchIO(-format => 'blast',
                       -file   => 'test.blastx');
while (my $result = $report_obj -> next_result) {
  while (my $hit = $result -> next_hit) {
    # get the e-value
    my $evalue = $hit -> expect;
    print "\n" . 'e: ' . $evalue . "\n";
while ( my $hsp = $hit->next_hsp ) {
  # get the query length
  my $length_q = $hsp -> length('query');
  print 'query length: ' . $length_q . "\n";    
      }
   }
   }

Does $hsp also have an attribute like query length, if so: how do I access it?

Or is there another way of getting the information I want to extract?

I appreciate every help :)

bioperl blast • 3.4k views
ADD COMMENTlink modified 6.2 years ago by Neilfws48k • written 6.2 years ago by Pukeko0
3
gravatar for Neilfws
6.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

All of the methods for Bio::SearchIO are summarised in the HOW-TO.

The length of the query sequence (as opposed to the length of the query in HSPs) is found by:

$result->query_length

What you are extracting using:

$hsp->length('query')

is the length of query sequence participating in the alignment, minus gaps.

"Length of the BLAST alignment" is a little imprecise. There may be > 1 HSP for each query/hit pair (in different frames, for example). You probably do not want to sum the length of all HSPs. This is why we typically deal with hits on a "per HSP" basis. So the closest thing to what you described: length of the BLAST alignment / length of the query sequence is probably:

$hsp->length('hit') / $result->query_length

assuming you do not want to count gaps as HSP length.

Anyway - have a good read of the HOW-TO and everything will be made clear.

ADD COMMENTlink written 6.2 years ago by Neilfws48k

Thank you very much for this quick and thorough answer!

ADD REPLYlink written 6.2 years ago by Pukeko0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1986 users visited in the last hour