I want to BLAST one protein sequence Q6GZX4.fasta against all the sequences in the file part0.fasta (FASTA format) containing 5000 sequences. First I tried using the part0.fasta directly as subject. Then I tried using a formatted database version of it (makeblastdb -in part0.fasta -title part0 -dbtype prot -out part0 -parse_seqids).
Using -query Q6GZX4.fasta and -subject part0.fasta (case A) [output is line count]:
user% blastp -query Q6GZX4.fasta -subject part0.fasta -evalue 100 -max_target_seqs 5000 -max_hsps 1 -outfmt 6|wc -l
4572
Using -query Q6GZX4.fasta and -db part0 (case B) [output is line count]:
user% blastp -query Q6GZX4.fasta -db part0 -evalue 100 -max_target_seqs 5000 -max_hsps 1 -outfmt 6|wc -l
43
Why do I get different results? 4572 hits in case A, but 43 in case B?
What happens when you run
blastdbcmd -info -db part0?user% blastdbcmd -info -db part0 Database: part0 5,000 sequences; 1,826,734 total residues Date: Mar 31, 2015 5:44 PM Longest sequence: 5,058 residues Volumes: /home/user/part0