Question: Bioperl Has Different Behaviours In Parsing Blast And Blast+ Result
1
gravatar for Yixf
7.4 years ago by
Yixf20
Yixf20 wrote:

When I was using BioPerl to parse the blast results, I found that it can parse the blast(2.2.21) result but not the blast+(2.2.25+) result(all were in the default format).

The parser.pl code:

#!/usr/bin/perl
use strict;
use Bio::SearchIO;

my $fi     = $ARGV[0];
my $format = $ARGV[1];
my $in     = new Bio::SearchIO(
    -format => "$format",
    -file   => "$fi"
);

print "#Query\tHit\tScore\tBits\tEvalue\n";
while ( my $result = $in->next_result ) {
    while ( my $hit = $result->next_hit ) {
        while ( my $hsp = $hit->next_hsp ) {
            print $result->query_name, "\t", $hit->name, "\t",
              $hsp->score, "\t", $hsp->bits, "\t", $hsp->evalue, "\n";
        }
    }
}

The result of parsing blast (it is OK):

> #Query    Hit    Score    Bits    Evalue
first    gi|195972856|ref|NM_001130955.1|    21    42.1    2e-04
first    gi|195972854|ref|NM_015318.3|    21    42.1    2e-04
second    gi|261878474|ref|NM_001166295.1|    21    42.1    2e-04
second    gi|261878472|ref|NM_001166294.1|    21    42.1    2e-04
second    gi|261878551|ref|NM_001166417.1|    21    42.1    2e-04
second    gi|261878470|ref|NM_001166293.1|    21    42.1    2e-04
second    gi|261878468|ref|NM_014021.3|    21    42.1    2e-04
third    gi|115392135|ref|NM_007249.4|    21    42.1    2e-04

The result of parsing blast+ (No output except the header):

> #Query    Hit    Score    Bits    Evalue

================================================================

According to BioPerl's wiki (NCBI-BLAST parsing problems), XML format is recommended. But I found that Bioperl has problems in parsing blast or blast+ results in XML format.

The parser.pl is the same; The result of parsing blast in XML format (I have three queries, but it print the first one only!):

> #Query    Hit    Score    Bits    Evalue
first    gi|195972856|ref|NM_001130955.1|    21    42.1223    0.000184141
first    gi|195972854|ref|NM_015318.3|    21    42.1223    0.000184141
first    gi|261878474|ref|NM_001166295.1|    21    42.1223    0.000184141
first    gi|261878472|ref|NM_001166294.1|    21    42.1223    0.000184141
first    gi|261878551|ref|NM_001166417.1|    21    42.1223    0.000184141
first    gi|261878470|ref|NM_001166293.1|    21    42.1223    0.000184141
first    gi|261878468|ref|NM_014021.3|    21    42.1223    0.000184141
first    gi|115392135|ref|NM_007249.4|    21    42.1223    0.000184141

The result of parsing blast+ in XML format (Besides the same problem, it can not get the query id properly!):

> #Query    Hit    Score    Bits    Evalue
Query_1    gi|195972856|ref|NM_001130955.1|    42    39.1570490084919       0.000615407041092949
Query_1    gi|195972854|ref|NM_015318.3|    42    39.1570490084919     0.000615407041092949
Query_1    gi|261878474|ref|NM_001166295.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878472|ref|NM_001166294.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878551|ref|NM_001166417.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878470|ref|NM_001166293.1|    42    39.1570490084919    0.000615407041092949
Query_1    gi|261878468|ref|NM_014021.3|    42    39.1570490084919    0.000615407041092949
Query_1    gi|115392135|ref|NM_007249.4|    42    39.1570490084919    0.000615407041092949

Does anybody meet the same problems? What is the problem? How to solve it?

bioperl blast error parsing • 4.3k views
ADD COMMENTlink modified 7.3 years ago by Burnedthumb90 • written 7.4 years ago by Yixf20

I noticed this and I agree it is annoying. I didn't find a way to consensify the results.

ADD REPLYlink written 7.4 years ago by 2184687-1231-83-4.9k

Did you try using Blast 2.2.25 (not plus) and format your database with -parse_seqids? Not entirely sure as this is a long time ago, but I think the database format has slightly changed between then and now.

ADD REPLYlink written 7.4 years ago by Michael Schubert6.8k

I have tried Blast 2.2.25 and format the database with -parse_seqids, it has the same problem. I think it is a bug.

ADD REPLYlink written 7.4 years ago by Yixf20
3
gravatar for Chris Fields
7.4 years ago by
Chris Fields1.9k
University of Illinois Urbana-Champaign
Chris Fields1.9k wrote:

The best thing to do for this is to file it as a bug and attach test data. My guess is something changed in the latest BLAST+ text output.

Re: XML, it is supposed to be the most stable output; again, if something isn't working then file this as a bug (again, with example data) so it can be addressed. IIRC, re: query ID the tag BLAST used for reporting this had changed hence the odd return data.

ADD COMMENTlink written 7.4 years ago by Chris Fields1.9k
2

Here, https://redmine.open-bio.org/projects/bioperl as noted on the main website, http://bioperl.org (see Bugs).

ADD REPLYlink written 7.4 years ago by Peter5.7k

Hi, Chris Fields Thanks for your reply. Do you known the email or website that I can report this bug?

ADD REPLYlink written 7.4 years ago by Yixf20

Yeah it looks like it is a Bioperl problem for the output of 2.2.25, not specifically Blast+. Odd that the XML is acting up though.

ADD REPLYlink written 7.4 years ago by Dan Gaston7.1k

At least one of the tags changed for query data, I believe

ADD REPLYlink written 7.3 years ago by Chris Fields1.9k
2
gravatar for Burnedthumb
7.4 years ago by
Burnedthumb90
Netherlands
Burnedthumb90 wrote:

Instead of using bioperl, you can also specify which field you want using the "-outfmt" tag. Like this:

blastn -db your_database -query your_query.fasta -outfmt "6 qseqid sseqid score bitscore evalue" >> blastout.txt

ADD COMMENTlink written 7.4 years ago by Burnedthumb90
1

The bug has been reported: https://redmine.open-bio.org/issues/3265

ADD REPLYlink written 7.4 years ago by Yixf20

I known this. But I want to know why BioPerl can not parse blast result properly.

ADD REPLYlink written 7.4 years ago by Yixf20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1686 users visited in the last hour