Question: Problem with blastp when blasting against custom made database
1
gravatar for thjnant
4.4 years ago by
thjnant90
Germany
thjnant90 wrote:

Hello,

I have been trying the last days to make my database for blast work but I really don't know how to figure out this problem.

Begining from the start of the problem, I want to make a blastp against a protein database that I have created.

Here is an example of my sequence identifiers and how they are formatted:

>TRA_G345655|locus=scaffold45|99209|101155|-|translate_table|standard

I try to make the blast database with the following command:

makeblastdb -in database.fa -title database -dbtype prot -out protdb -parse_seqids

It does not work and it gives me the following error:

No volumes were created because no sequences were found.

BLAST Database creation error: Defline lacks a proper ID around line 1

When I try it without '-parse_seqids', it works fine and it generated the three files with pin, psq and phr.

I put these 3 files with my database.fa fasta file in one folder called DB and then I try to run blastp with the following command:

blastp -query query.fa -db DB -out proteins_blastp_1e-30_table.txt -evalue 1e-30 -outfmt 6

or with:

blastp -query query.fa -db DB/database.fa -out proteins_blastp_1e-30_table.txt -evalue 1e-30 -outfmt 6

or with:

blastp -query query.fa -db database -out proteins_blastp_1e-30_table.txt -evalue 1e-30 -outfmt 6

But it gives me the following error:

BLAST Database error: No alias or index file found for protein database [database] in search path

How can I solve this problem or where is the origin of this problem?

blast • 4.7k views
ADD COMMENTlink modified 4.4 years ago by Michael Dondrup45k • written 4.4 years ago by thjnant90

The scenario you are describing can never happen, blastp would never ask for a nucleotide database, so you first need to get the association between commands and error message right. Most likely that you simply mixed up the blast programs (blastp/blastn/x)

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Michael Dondrup45k

No, I have not mixed up blastp/blastn/x. This is the simply the situation I am describing, I have two fasta files of proteins and I want to use blastp and this is the error I get which I have copied from my linux shell.I ran the command again, I updated the error, it must have been a mistake with nucleotide.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by thjnant90

Yes you have, but you have also updated your question with a new error message, because in the first version it was saying :"No alias or index file found for nucleotide database" now it says protein.

ADD REPLYlink written 4.4 years ago by Michael Dondrup45k

hey can you give me some advice on a blast installation matter ?

so for the past two days i have been trying to install and execute a stand alone blast named

" ncbi-blast-2.2.30+" on a centos os system

i managed to download a nr ref sequence from ncbi ftp using the command

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz

and extracted the file using the command

 tar -xvpf nr.gz

i got an nr file of 33 gb size

but when i try to format the file using the command

./makeblastdb -in /home/Desktop/ncbi-blast-2.2.30+/db/nr -dbtype 'nucl' -input_type 'fasta' -out /home/Desktop/ncbi-blast-2.2.30+/output     "highlighted folder path"

i get the error showing

BLAST Database creation error: FASTA-Reader: No residues given

can you give any suggestions on the nature of the problem and how i can solve it?

thanks

ADD REPLYlink written 4.2 years ago by vigneshprbh3720
2
gravatar for Michael Dondrup
4.4 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

The documentation on how to build proper sequence ids is here:http://www.ncbi.nlm.nih.gov/toolkit/doc/book/ch_demo/#ch_demo.T5

In short:  your fasta defline doesn't contain a valid identifier and needs to be changed to be parsable. A valid identifier consists of  two or three fields separated by "|".

 

E.g. for a ref seq entry: 

ref|NM_010450.1|

 

For your case most likely the sequence has a local identifier, then you should change your defline to start with:

lcl|TRA_G345655

 

The second error is caused by:

makeblastdb -in database.fa -title database -dbtype prot -out protdb -parse_seqids

so you also need to change this:

blastp -query query.fa -db database protdb -out proteins_blastp_1e-30_table.txt -evalue 1e-30 -outfmt 6

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Michael Dondrup45k

Thank you so so so much, I changed the header and also applied your second comment in the blastp command, and now it is working, perfect. And well, then yes, I had mixed up those things :) Thank you so much again.

ADD REPLYlink written 4.4 years ago by thjnant90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1269 users visited in the last hour