Entering edit mode
7.5 years ago
jackhiggins2017
•
0
Hi, I'm trying to install BLAST and use it on my local machine. I've installed BLAST, but I'm not quite sure why it isn't working - I've even tried throwing the blast executables (blastp, etc) and it doesn't seem to work. Here's and example output:
python myblastscript.py
Running blastp, using min e-value, retrieving hits
Extracted 1 features
BLAST-ing gene_1
ID: gene_1
Name: <unknown name>
Description: <unknown description>
Number of features: 1
Seq('MDGRRSRHTDDTDVLLRIHHVIGELPTYGYRRVWALLRRQAELDGMPAINAKRV...LEI', ExtendedIUPACProtein())
sh: blastp: command not found
query cover: 0.151515151515 max_iden: 0.95
904667897 CSK81904.1 Uncharacterised protein [Shigella sonnei] 145 264
905276864 CSH89806.1 Uncharacterised protein [Shigella sonnei] >gi 145 264
Here's snips of what's not working, essentially the blast_cline works, but it's not recognizing blastp as a function.
def Blast_Features_Local(seqname):
# open contents of seq file, add features from file of orfs
# blast each file and return a single output table
# loop through ORFs and print results in outFile
blast_type="blastp" # set type of blast: n, p or x
blast_db = "nr" # use nr for protein alignments
blast_db_path = "/Users/username/Desktop/db/"
min_evalue = 0.0001
max_hits = 5
#make a call to blast
#create command line call to blast
blast_cline = NcbiblastpCommandline(query='Temp.fasta', db=blast_db_path+blast_db, evalue=min_evalue, outfmt=5, out="temp_blast.xml", max_target_seqs = max_hits)
os.system(str(blast_cline))
No idea why it's saying blastp command not found, I imported "from Bio.Blast.Applications import NcbiblastpCommandline" as well so I know that isn't the issue. Please help, thank you!!
It looks like you are not providing type of blast (
blast_type
) you want to run in your command line (not sure if that is happening based on the snippet you have provided)? Also add the directory where blast executables are installed to your$PATH
.I'm not quite sure how to add the directory to my blast executables to my $path in the script, I know my path is this, /Volumes/Macintosh HD2/usr/local/ncbi/blast/bin
How do I incorporate that into the script..
Here is how you can modify $PATH (and make the changes permanent, if you wish).
How and where did you install blast? This looks like the system is unable to find the blastp binary in your system's executable path.
I downloaded the .dmg and ran the .pkg file to install BLAST to my machine. It installed here /Volumes/Macintosh HD2/usr/local/ncbi/blast/bin
Follow's genomax2's link and you'll be all set with the installation. :)
Ok great, $PATH is now allowing the blastp function to work, but not I'm having trouble referencing the database.
And now here's the error....
Traceback (most recent call last):
I tried putting the db folder containing my databases into plasmidannotations and that isn't working either... Basically I'm not sure how to accurately reference the databases to be accessed. A commented line above the path I'm trying to work with was something that someone else had used previously but I can't quite get that to work either.... Any help appreciated.
If you want to blast against
nr
then that base name needs to be used in the command (-db /path_to/nr
). Are there several files there that start with nr*I downloaded all 50+ nr databases into a single folder, they start as nr.XX.tar.gz (XX = 00 through 55) and then unpacking each file I now have 56 folders called nr.XX (0-55) which each contain an allotment of files. To be frankly honest, I've no idea what any of the files do individually. I'm thoroughly confused as to how I'm to access all of these from the command line.. Do I need to take all of the inner files from each of the 50+ folders and place all of them in one folder? I really don't know what I'm doing at this point.
Bingo! There should also be at least one more file ending in
.pal
extension. All these files need to be in the same folder. Once you have this setup. Point-db
to/path_to_new_folder/nr
and that should do it.Agh! Ok, so all of the files should be taken out of their folders and placed in db, or only the .pal files? I'm assuming there's a .pal file in every single one, I don't have access to the main I'm doing the work on so I can't verify that and try this solution at the moment, but I want to have a good grasp when I'm able to work on it
There should be only one .pal file (which describes the pieces that form the whole nr database). Take all other files out and put them in a single folder along with the .pal file. There should be 670+ files that start with nr* (don't count the gz files).