Read Blast Output Directly From Stdout With Bioperl
Entering edit mode
12.3 years ago
Pasta ★ 1.3k

Hi there,

I finally decided myself to try Perl for a project and I got a question concerning BioPerl. I would like to read the result (blastxml) of a blast query directly from STDOUT. Is it possible ?

Here is my attempt:

   my $command = "echo '>$SeqName\n$_[0]' | blastall -a 4 -p blastn -d $db  $options";

   open $fh,"$command |" || die("cannot run fasta cmd of $command: $!\n");

   my $seqout = Bio::SeqIO->new(-fh => $fh,  -format => 'blastxml');

Also, I manage to catch STDOUT using IO::CaptureOutput, but I cant feed it into Bio::SeqIO. Maybe there is something than can be done with that.

Any help will be appreciated. Thanks

bioperl xml blast • 3.0k views
Entering edit mode

As Fwip says in their answer below: this will not work because you're using Bio::SeqIO. BLAST XML is not a sequence format. Suggest you read the Bioperl documentation and try some simple examples to get comfortable.

Entering edit mode
12.3 years ago
Fwip ▴ 500

First, I usually capture input like this:

my $command = "echo '>$SeqName\n$_[0]' | blastall -a 4 -p blastn -d $db  $options";
my $result = system($command);

And then $result has whatever came from STDOUT.

I am not aware of a SeqIO class for BlastXML input. Personally, I use the StandAloneBlastPlus module, but it looks like you are using the older interface to BLAST, in which case you may find more success using StandAloneBlast. I am not sure if these write to a temporary file or not.

The Bio::SearchIO module may be what you're looking for instead of SeqIO.

Entering edit mode

Perfect ! I simply had to use SearchIO instead. Thanks for your time

Entering edit mode
12.3 years ago

Most likely it is because you are feeding an incomplete XML file into the BioPerl parser.

XML format contain start and end tags that informs the parser of the data structure. At the very least, the stdout file will be missing the end tags for the topmost start tags (probably BlastOutput and BlastOutput_interations tags).

Here is an example of one iteration of the blast xml output:

  <Iteration_message>No hits found</Iteration_message>

If blastall is outputting xml lines one at a time, you will be feeding the parser one of the above lines at a time. Which will make no sense at all to the parser. I suggest you read up on the [?]XML format[?]

I would configure blastall to output a tab delimited format or text format for easier parsing.

If you really want to use XML, you can try to write a data buffer into your perl script. The data buffer will have to:

  1. save the initial blast settings/version information
  2. check if current multi-line xml iteration of blast is finished from stdout and save it to the buffer
  3. append the initial blast settings/version information to the current iteration
  4. append the appropriate end tags
  5. feed the formatted data to the BioPerl parser
Entering edit mode

No worries, the XML output is fine. I just want to feed the XML from STDIN to the BioPerl parser

Entering edit mode

I understand that it is fine. But the way XML is parsed means you need to give it a complete data structure. STDOUT from blastall will feed the parser partial data.


Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6