Question: Command not found (Trivial possibly, but I'm stuck)
0
gravatar for jackhiggins2017
2.4 years ago by
jackhiggins20170 wrote:

Hi, I'm trying to install BLAST and use it on my local machine. I've installed BLAST, but I'm not quite sure why it isn't working - I've even tried throwing the blast executables (blastp, etc) and it doesn't seem to work. Here's and example output:

python myblastscript.py
Running blastp, using min e-value, retrieving hits
Extracted 1 features
BLAST-ing gene_1
ID: gene_1
Name: <unknown name>
Description: <unknown description>
Number of features: 1
Seq('MDGRRSRHTDDTDVLLRIHHVIGELPTYGYRRVWALLRRQAELDGMPAINAKRV...LEI', ExtendedIUPACProtein())

sh: blastp: command not found

query cover: 0.151515151515    max_iden: 0.95
904667897    CSK81904.1       Uncharacterised protein [Shigella sonnei]      145     264
905276864    CSH89806.1       Uncharacterised protein [Shigella sonnei] >gi  145     264

Here's snips of what's not working, essentially the blast_cline works, but it's not recognizing blastp as a function.

def Blast_Features_Local(seqname):
# open contents of seq file, add features from file of orfs
# blast each file and return a single output table
# loop through ORFs and print results in outFile


blast_type="blastp" # set type of blast: n, p or x
blast_db = "nr" # use nr for protein alignments
blast_db_path = "/Users/username/Desktop/db/"
min_evalue = 0.0001
max_hits = 5


    #make a call to blast
    #create command line call to blast

    blast_cline = NcbiblastpCommandline(query='Temp.fasta', db=blast_db_path+blast_db, evalue=min_evalue, outfmt=5, out="temp_blast.xml", max_target_seqs = max_hits)


    os.system(str(blast_cline))

No idea why it's saying blastp command not found, I imported "from Bio.Blast.Applications import NcbiblastpCommandline" as well so I know that isn't the issue. Please help, thank you!!

blast • 855 views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by jackhiggins20170

It looks like you are not providing type of blast (blast_type) you want to run in your command line (not sure if that is happening based on the snippet you have provided)? Also add the directory where blast executables are installed to your $PATH.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax63k

I'm not quite sure how to add the directory to my blast executables to my $path in the script, I know my path is this, /Volumes/Macintosh HD2/usr/local/ncbi/blast/bin

How do I incorporate that into the script..

ADD REPLYlink written 2.4 years ago by jackhiggins20170

Here is how you can modify $PATH (and make the changes permanent, if you wish).

ADD REPLYlink written 2.4 years ago by genomax63k

How and where did you install blast? This looks like the system is unable to find the blastp binary in your system's executable path.

ADD REPLYlink written 2.4 years ago by Eric Lim1.3k

I downloaded the .dmg and ran the .pkg file to install BLAST to my machine. It installed here /Volumes/Macintosh HD2/usr/local/ncbi/blast/bin

ADD REPLYlink written 2.4 years ago by jackhiggins20170

Follow's genomax2's link and you'll be all set with the installation. :)

ADD REPLYlink written 2.4 years ago by Eric Lim1.3k

Ok great, $PATH is now allowing the blastp function to work, but not I'm having trouble referencing the database.

    blast_db = "nr" # use nr for protein alignments and blastx, use nt for blastn
   #blast_db_path = "/usr/local/ncbi-blast-2.2.29+/db/"
    blast_db_path = "/Users/username/Desktop/db/"

    blast_cline = NcbiblastpCommandline(query='Temp.fasta', db=blast_db_path+blast_db, \
                            evalue=min_evalue, outfmt=5, out="temp_blast.xml",\
                                         max_target_seqs = max_hits)

And now here's the error....

BLAST Database error: No alias or index file found for protein database [/Users/username/Desktop/db/nr/] in search path [/Users/username/Desktop/plasmidannotations::]

Traceback (most recent call last):

  File "AnnotationTools.py", line 436, in <module>
    Blast_Features_Local(sqname)
  File "AnnotationTools.py", line 239, in Blast_Features_Local
    blast_record = NCBIXML.read(result_handle)
  File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 530, in read
    first = next(iterator)
  File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 575, in parse
    raise ValueError("Your XML file was empty")
ValueError: Your XML file was empty

I tried putting the db folder containing my databases into plasmidannotations and that isn't working either... Basically I'm not sure how to accurately reference the databases to be accessed. A commented line above the path I'm trying to work with was something that someone else had used previously but I can't quite get that to work either.... Any help appreciated.

ADD REPLYlink written 2.4 years ago by jackhiggins20170

If you want to blast against nr then that base name needs to be used in the command (-db /path_to/nr ). Are there several files there that start with nr*

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax63k

I downloaded all 50+ nr databases into a single folder, they start as nr.XX.tar.gz (XX = 00 through 55) and then unpacking each file I now have 56 folders called nr.XX (0-55) which each contain an allotment of files. To be frankly honest, I've no idea what any of the files do individually. I'm thoroughly confused as to how I'm to access all of these from the command line.. Do I need to take all of the inner files from each of the 50+ folders and place all of them in one folder? I really don't know what I'm doing at this point.

ADD REPLYlink written 2.4 years ago by jackhiggins20170

Do I need to take all of the inner files from each of the 50+ folders and place all of them in one folder?

Bingo! There should also be at least one more file ending in .pal extension. All these files need to be in the same folder. Once you have this setup. Point -db to /path_to_new_folder/nr and that should do it.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax63k

Agh! Ok, so all of the files should be taken out of their folders and placed in db, or only the .pal files? I'm assuming there's a .pal file in every single one, I don't have access to the main I'm doing the work on so I can't verify that and try this solution at the moment, but I want to have a good grasp when I'm able to work on it

ADD REPLYlink written 2.4 years ago by jackhiggins20170

There should be only one .pal file (which describes the pieces that form the whole nr database). Take all other files out and put them in a single folder along with the .pal file. There should be 670+ files that start with nr* (don't count the gz files).

ADD REPLYlink written 2.4 years ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour