Help With Formatdb And Blast All
Entering edit mode
14.0 years ago
Craig ▴ 30

I have my own database of sequences stored in MySQL. I convert this database into a blastable format using formatdb, code below:

formatdb -p T -i db.fasta

I then do the same with the query sequence and the blast the query against my database using the code below:

blastall -d db.fasta -i query.fasta -p blastn -m 8 -e 0.01 -o output.blast

In the output.blast file i get something like this:

1_0         100.00 910 0 0 1 910 4 913 0.0 1438

I queried the database with a sequence i knew that was already in the database (NCBI ID= AAK92094). Where the value "1_0" is i would like it to the NCBI ID number which i have stored in the database fasta file.

The format of my db.fasta file looks like this before i convert it to a blastable format.

AAK92094 the end of the sequence
NP_882282 the end of the seqeunce

Any idea how i do it?

blast • 16k views
Entering edit mode

I don't understand your question, can you rephrase it?

Entering edit mode
14.0 years ago

Are you sure you want to use the blastall executable? It has been deprecated in favor of the newer BLAST+ executables in 2009 (cf. NCBI webpage).

The up-to-date version would be to use

<blast_executable> -query <fasta_file> -db <database> -out <output_file> -evalue 0.001 -outfmt <5 for XML>

where the blast executable can either be blastn, blastp, tblastn, or tblastx. The XML output format is pretty well human-readable, and there are tools available to parse it (eg. BioPython). See -help option for details. The database therefore is formatted using the makeblastdb command:

makeblastdb -in <fasta> -dbtype <nucl|prot> -title <db_title> -out <db_filename>

As for your db.fasta, are you sure that it is valid FASTA format? For it to be, all name lines should begin with a >.

Entering edit mode

Agreed, you should check the format of db.fasta. The BLAST output should always contain identifiers for both query and hit; I have never seen anything like "1_0" appear in BLAST output.

Entering edit mode
14.0 years ago

There are a two of strange think in your question that you should clarify :

  1. "... I then do the same with the query sequence ......."

    => Usually you don't have to use formatdb with the query sequence since the query sequence just need to be in the fasta format

  2. ..... formatdb -p T ................blastall -p blastn

    => It is very strange that you format a protein bank (-p T) and then blast a protein query sequence using the option -p blastn which is dedicated to blast nucleotides sequences

So my recomendation would be to check that you query sequence is in the fasta format and that you use -p blastp with your blastall command

More over you protein sequences in fasta format should look like below:

>AAK92094 the end of the sequence
>NP_882282 the end of the sequence
Entering edit mode

Well spotted; I completely missed the blastn versus a protein database. That would certainly explain the strange output.


Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6