Question: Standalone Blast Options
gravatar for Carol
8.5 years ago by
Carol130 wrote:

Hi all , I'm using local BLAST to retreive the single top most hit invoking "-v" option of blastall/RPSBLAST, from the local database sequences. I'm getting the top hit in most of the cases but in some cases there are more than one HSP fetched which are mostly the repeat of the same query sequence.Can anybody advise me any option of BLASTALL/RPSBLAST to restrict this redundancy. Thanks in advance.

blast • 7.9k views
ADD COMMENTlink modified 8.2 years ago by Khader Shameer18k • written 8.5 years ago by Carol130
gravatar for Pierre Lindenbaum
8.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

If you're using the XML output for BLAST, the following XSLT stylesheet only prints the first hit.

<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="" rel="nofollow">" "="" rel="nofollow">'
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>

<xsl:template match="*|text()">
    <xsl:apply-templates select="*|text()"/>

<xsl:template match="Iteration_hits">
<xsl:apply-templates select="Hit[1]"/>



  xsltproc --novalid firsthit.xsl blast.xml

Of course, you can easily modify the rule 'select="Hit[1]""' to match your needs. eg.

ADD COMMENTlink written 8.5 years ago by Pierre Lindenbaum118k
gravatar for Neilfws
8.5 years ago by
Sydney, Australia
Neilfws48k wrote:

I don't think there is an option to blastall/rpsblast which will fix this issue. This usage guide states, for the -b flag, that "This is not the number of alignment segments or HSPs, since a given domain may have more than one portion aligned to the query."

You could get a list with only the top hit, ignoring the composite HSPs, by parsing the BLAST output. Using the SearchIO library from Bioperl, something like this should work:

#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
my $searchio = Bio::SearchIO->new(-file => "myblastfile", -format => "blast");

while(my $result = $searchio->next_result) {
  while(my $hit = $result->next_hit) {
    my @output = ($result->query_name, $hit->name, $hit->raw_score,
                  $hit->bits, $hit->significance);
    print join("\t", @output), "\n";

This is just an example with some selected BLAST statistics (raw score, bit score etc.); see the documentation for how to access other parts of the BLAST report.

ADD COMMENTlink written 8.5 years ago by Neilfws48k

It's not really a bug. If there is no "best" HSP (since they're identical) then in effect, they are all the "top hit". If the raw output isn't what you want, the solution is to parse it.

ADD REPLYlink written 8.5 years ago by Neilfws48k

Thanks for the advice.Same output can be retrived using blastall option m-7 which generates output in xml format and can be viewed using MS excel.However the redundancy still remains.I think this is bug with BLAST and should be improved.

ADD REPLYlink written 8.5 years ago by Carol130

I agree with peirre and neilfws that parsing is the only option to remove composite HSP's. Thanks for all your suggestions.

ADD REPLYlink written 8.4 years ago by Carol130
gravatar for Khader Shameer
8.4 years ago by
Manhattan, NY
Khader Shameer18k wrote:

If you want to reduce redundancy of your hits, you may pre-process your target database using CD-HIT. For example you may use a threshold (say 40%, so that no 2 sequences in your dataset will be of more than 40% similar). Depending up on your need you may use a stringent threshold (<=40%) or lenient (>=40%). For the statistical detail of the CD-HIT algorithm you may refer to the following papers (1, 2 and 3)

ADD COMMENTlink written 8.4 years ago by Khader Shameer18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 767 users visited in the last hour