Question: Running A Standalone Batch Blast With Fasta File Using Perl, Can'T Get Results For All Sequences. How To Get Top Hits?
gravatar for James_Up_North
7.1 years ago by
James_Up_North10 wrote:

I'm running standalone blast locally. My blast query sequencess are in a fasta file called human_seqs.fasta, which comprise around 300 sequences. Being blasted against a much larger database called DB.fasta. I then take the Accession IDs of the blast results and save to another file called IDs.txt to later retrieve the full fasta sequences of the hits in the database. The way my code is at the moment, it does the BLAST ok, but it is only running the BLAST for the first sequence in the query file. So I get a list of IDs but only for the first human seq query, still another 300 or so un-blasted!. I can guess that there is something wrong with my loop/code structure, but I can't figure it out!

Also, I would like to just extract the Accession IDs of the top hits, so I guess the hit with the lowest 'e' value or highest score. I'm not sure how to do this. Thanks

#!/usr/bin/perl  -w

use strict;

use Bio::Tools::Run::StandAloneBlast;
use Bio::SearchIO;
use Bio::SeqIO;

my $in_file = human_seqs.fasta";
my $virus_file = DB.fasta";
my @params = (-p => 'blastp', -d => 'DB_proteins.fasta', -o => 'report.bls', -e     => '10' );

my $seqio_object = Bio::SeqIO->new(-file => $in_file);
my $virus_seqio_object = Bio::SeqIO->new(-file => $virus_file);  
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = "";

open(my $out, '>', 'IDs.txt') or die "failed to open output for write: $!";  # open   output file for saving accession Ids

while (my $seq = $seqio_object->next_seq){
    print "Blasting: ", $seq->id, "...\n";
    $blast_report = $factory->blastall($seq);
    while (my $result = $blast_report->next_result){
        while (my $hit = $result->next_hit){
            $hit->description =~ m/\[(.*)\]/;
            my $virus = $1;    
            print $hit->name(), "\t",  $hit->significance(), "\t", $hit-> accession(),"\t", $virus, "\n";

         my $accession_ids= $hit-> accession();

print { $out }   $accession_ids,"\n";   # save accession IDs to file IDs.txt for passing to fastacmd


system("fastacmd -d DB.fasta -i IDs.txt -p T -o full_seqs_hits.fasta");    # use fastacmd via the cmd line to pullout full seqs

perl bioperl blast • 5.8k views
ADD COMMENTlink modified 7.0 years ago by brentp22k • written 7.1 years ago by James_Up_North10

and what happens when you manually remove the first sequence from the query file? the second one returns a result? looks like you just need more debugging statements and you'll get to the bottom of this

ADD REPLYlink written 7.1 years ago by Jeremy Leipzig17k

Hi James! Please don't forget to accept the answer that solved your problem.

ADD REPLYlink written 7.1 years ago by Michael Schubert6.8k
gravatar for brentp
7.1 years ago by
Salt Lake City, UT
brentp22k wrote:

You have an exit; inside your loop. It seems you want to move the final "}" up to the line before "close($out)".

EDIT: And it looks like you can get the best blast hit by adding "-best_hit_only => 1" to your params. From the commandline, it would be -K 1.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by brentp22k

Thank you, that worked :-) Uisng -K 1 still gives me more than one hit though. Eg for the first query sequence I seem to get the top 3 hits!

ADD REPLYlink written 7.1 years ago by James_Up_North10

I think I've found it -v1 -b1 looks like it does the job

ADD REPLYlink written 7.1 years ago by James_Up_North10
gravatar for Larry_Parnell
7.1 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

I'd suggest looking at the sequence file. I have noticed that for some sequences the fasta definition line runs into the sequence data and this gives an "empty" sequence that returns no hits. Unlikely to be your program here, but worth a quick check.

ADD COMMENTlink written 7.1 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1472 users visited in the last hour