Question: Why does some hits that show up in search against smaller database, but does not show up in entire nr database?
2.4 years ago by
sajal0 wrote:

I made a subset of nr databse using blastdbaliastool. I split the nr database into 70%:30%, and created two smaller databases (say, nr-70 and nr-30). Now I run blastp against both of them using query q and store the two results separately (say result-70, result-30). I use "-max_target_seqs 50" and "-max_hsps 20" options. No explicit e-value cutoff is given.

I run blastp against whole nr using the same query q, say the result is result-100. When I compare result-70 and result-30 against the result-100, I see a strange phenomena. While result-70 and result-30 have 29 and 26 hits, result-100 has only 28 hits. Since nr-70 and nr-30 are non-overlapping and they together constitute nr, it should find 50 hits since 29 + 26 = 55 > 50. Some of the hits found from search against smaller databases don't show up in the search result from the larger database.

Any idea why this is happening?

ADD COMMENTlink modified 2.4 years ago by piet1.8k • written 2.4 years ago by sajal0

I don't know if it the cause of your results, but --max_target_seqs have a known somewhat unexpected behaviour:

What BLAST's max-target-sequences doesn't do

I wonder if database size also interacts these blast heuristcs.

ADD REPLYlink written 2.4 years ago by h.mon32k
2.4 years ago by
buchfink140 wrote:

The evalue depends on the size of the database, so some hits might be above the evalue treshold (10) if you use the whole db.

ADD COMMENTlink written 2.4 years ago by buchfink140
2.4 years ago by
planet earth
piet1.8k wrote:

Please note that you are limiting the number of hits returned (options -max_target_seqs 50 and -max_hsps 20). Try to set these options to much larger numbers, for example -max_target_seqs 20000 and -max_hsps 20000.

ADD COMMENTlink written 2.4 years ago by piet1.8k
