Why does NCBI BLAST Standalone returns less hits than Biopython qBLAST?
6.1 years ago
philipp ▴ 30

Hi there,

I set up the NCBI BLAST standalone on my computer and downloaded nt.[01-39].tar.gz as my subject database. When a query an example sequences with the standalone BLAST I receive less hits than online or when I use biopython qBlast (result_handle = NCBIWWW.qblast("blastn", "nt", record.seq, expect=10, hitlist_size=100000))

When I use qBlast or the online tool I have 2564 hits compared to 506 hits when I use the standalone BLAST (blastn –query mysequence.txt –db nt –out mysequence_vs_NT.txt -outfmt 17)

Interestingly I get 1623 hits if I use the parts of the database one by one. So it doesn't match with the online hits but is "better" than the standalone result. (blastn –query mysequence.txt –db nt.[0-39] –out mysequence_vs_NT[0-39].txt -outfmt 17)

How can I get all hits with the standalone BLAST? I appreciate all your help and input!

Thanks, Philipp

blast biopython ncbi qblast hits
Thanks Piet, it works when I set -max_target_seqs to a very high number.

@Natasha, I'm using the outfmt 17 option. I don't know if it's new but it creates a list of matches in the database.

Don't think 17 is a valid option. Blast is probably using just 1.

0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)

It is not the answer to your question, but why do you use -outfmt 17? It should be 7, probably...

compared to 506 hits when I use the standalone BLAST

seems like blast is restricted to report only 500 hits by default. Blast has several command line options which can be tuned to return more hits.

