Missing BLAST hits - max_hsps and max_target_seqs?
1
0
Entering edit mode
11.2 years ago
glenn ▴ 10

Hi,

I have a single file that contains 10,000 query sequences, each 300bp long. The subject is a single chromosome that I have imported into a nucleotide database ("makeblastdb -dbtype nucl"). Due to the nature of the data, I am expecting virtually all (>99%) of the query sequences to find a strong match, but I only want to return the TOP match per query sequence. I assume that "max_hsps" and "max_target_seqs" should be the way to achieve that, but I don't seem to be getting the expected results.

If I use "max_hsps 1 max_target_seqs 1", I get 322 (unique) hits. If I use "max_hsps 1" by itself, then I get the same 322 hits. If I use "max_target_seqs 1", I get an enormous number of hits (which I could reduce by filtering by evalue, but that's not really the point - I just want the top hit). If I use no parameters, then I get a similarly enormous number of results.

It feels as though there is an error in blast where it is simply not blasting the vast majority of the sequences. I know there was a bugfix a couple of versions back that fixed something similar ..

Has anyone come across something similar? Can anyone think of anything obvious that I might be doing wrong?

I'm using blast 2.2.29, on a Mac Mini running Darwin 13.3.0.

EDIT: Just in case it's not clear, I am hoping to have up to (but probably slightly less than) 10,000 results in my output file (one per query sequence). I am using "-outfmt 10", outputting to CSV.

blast software error • 6.0k views
ADD COMMENT
2
Entering edit mode
11.2 years ago
glenn ▴ 10

I think I've answered my own question:

I need to use "task -blastn". The default task appears to be "megablast".

ADD COMMENT

Login before adding your answer.

Traffic: 3128 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6