I have to blast a zip file including hundreds of protein fasta files. Since it is impossible to blast them one by one, I plan to use perl to blast in a local database. I am new to perl and look for models online. My question is how to unite a unzip perl with a blast perl.
- Is $name in unzip.pl the result of unzipped fasta? what should I do at # Do something here ?
- $f in BlastList.pl is the query. How could I change it to connect result of unzip.pl.
- Is any other solution better than this one? Thank you
Here is unzip.pl .
my $zipfile = "fasta.zip";
my $u = new IO::Uncompress::Unzip $zipfile
or die "Cannot open $zipfile: $UnzipError";
my $status;
for ($status = 1; $status > 0; $status = $u->nextStream())
{
my $name = $u->getHeaderInfo()->{Name};
warn "Processing member $name\n" ;
my $buff;
while (($status = $u->read($buff)) > 0) {
**# Do something here**
}
last if $status < 0;
}
die "Error processing $zipfile: $!\n"
if $status < 0 ;
Here is the blast model using local database: BlastList.pl.
$DBNAME = "mydb";
$dirtoget="C:/Blast";
opendir(IMD, $dirtoget) || die("Cannot open directory");
# delete the old "DosBlast.bat"
$dosfile = "DosBlast.bat";
unlink($dosfile);
# Get the list of the new sequence files to blast
@thefiles= readdir(IMD);
closedir(IMD);
# Create a new file "DosBlast.bat"
open(OUT,">DosBlast.bat") || die "cannot open file for writing: $!";
foreach $f (@thefiles)
{
unless ( ($f eq ".") || ($f eq "..") || ($f eq "DosBlast.bat") || ($f eq "BlastList.pl") || ($f eq $DBNAME)||
($f eq "BlastOut") || ($f eq "blastall.exe") || ($f eq "formatdb.exe")||
($f eq "formatdb.log")|| ($f eq "ReadMe.doc")){
@myarray = split(/\./,$f); # Old file name
$extension =".txt"; # This is the new file extension
@newname=@myarray[0].$extension;
print(OUT "blastp -db $DBNAME -query $f -out @newname -num_descriptions 1");
} # end of unless
} # end of foreach
"Since it is impossible to blast them one by one" : why ?
From the looks of this code I think you are really lost. I don't mean that to be negative, but I'm being honest when I say you cannot improve this in any way except taking another approach. Try to simplify things. Take a step back and try to get the blast command working on a single file. When that is working, try to do a simple procedure on the zipped directory like counting the files. If you can accomplish those things, then it should be possible to put things together like Eric Normandeau suggested using a shell command. This way you can build up your command and not worry about also getting the Perl syntax right.
Looks like they are making a BAT file because they are in a windows environment. The perl script is just a way of generating that, which is what they will actually run. Its not a terrible idea, but it would probably just be simpler to, instead of writing out to a bat file, make a direct system call to blastp. But to be honest I haven't done this sort of thing in a Windows setting other than cygwin.
It's not a bad idea, but instead of dealing with two monsters at once, blast and perl, I think it would be advisable to just work out the blast steps since that is the goal. It looks like the Perl aspect of this task is really a stumbling block and it may be better to remove that from the equation for now. At least, until it's clear the sequences can be processed and blasted correctly. Another possibility on windows would be to use bioperl's run package to handle the blast and use bioperl to parse the results.
Thank you all for the suggestions. Frankly, I was lost when I met the problem. What I could do was to blast them one by one manually.