Bioperl Standaloneblastplus, Cleaning Up Thousands Of Temp Files
Entering edit mode
9.9 years ago


I'm using StandAloneBlastPlus BioPerl module as NCBI blast+ wrapper. I have to perform sequence alignments against a database of bacteria using multiple processes on a cluster. A temp file (.fas) is created for each call to blast+, so hundreds and thousands may be created. I think StandAloneBlastPlus use File::Temp module for that.

I would like to know the best way to control the growth of the number of files to avoid problems in my file system, wait until the program finishes to delete files is inappropriate. Moreover I don't know if there can be name collisions when create temp files with several processes running the same code with different data. The way they are created is transparent thanks to BioPerl.

Some code:

sub invoke_blast
  my $genome      = $_[0];
  my $query       = $_[1];

  $genome =~ /.*\/([^\/]*).fna/;
  my $name1 = $1;
  my @arr = split (/\|/, $query->display_id() );
  my $name2 = "$arr[0].$arr[1]";
  my $blastout = "$name1.$name2.blast";

  return $blast_factory -> bl2seq
              (-method  => 'blastn',
               -query   => $query,
               -subject => $genome,
               -max_target_seqs => 1,
               -outfile   => $blastout );

my $blast_factory = Bio::Tools::Run::StandAloneBlastPlus -> new
                    (-prog_dir => $BLAST_PATH,
                     -program => 'blastn' );

... &invoke_blast(...) in a loop

bioperl blast-plus • 2.4k views
Entering edit mode
9.9 years ago
Michael 52k

According to the documentation, the call to $blast_factory->cleanup should be sufficient.

Indeed, temporary files are created with statements like this:

File::Temp->new(TEMPLATE => 'DBDXXXXX', 
                 UNLINK => 0, 
                 DIR => $self->db_dir,
                 SUFFIX => '.fas');

Temporary Blast output seems to appear in the database directory, and there is no further control over the temporary directory. You can call cleanup() after a number of invocations or even after each, which will make it a bit slow, but remove most of the files. I tried to use it also in an END block and signal handler (e.g. SIG{INT} would be good to clean up) but that didn't work. Use of File::Temp should be considered a safe way to generate temporary files, according to its documentation it should be safe against race conditions.

For very large jobs, it might be more efficient to run Blast+ from the command line, and then parse the output via Bio::SearchIO instead.

Entering edit mode

Thanks Michael. Since I don't use a blast database directory I will use cleanup() after several invocations. I'm going to try running blast+ from the command line also.


Login before adding your answer.

Traffic: 2379 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6