blastn not returning all the expected results
1
0
Entering edit mode
3.4 years ago
selarch • 0

Hello all!

I'm trying to blast a single sequence against a custom blastdb (7 Gb in size) and the number of results is lower than what I expected.

Also, if I remove the sequence that were correctly aligned and launch a second blastn against the truncated blastdb, I get new results.

I indexed the custom blastdb with the following command:

makeblastdb -in db.fna -dbtype nucl

Then I launched blastn with the following command:

blastn -db db.fna -max_target_seqs 1000000 -word_size 11 -outfmt 6 -query query.fna > blast.out

I tried those commands with blast v2.4.0 and v2.7.1 that are currently available on our servers.

Am I missing something?

Thanks!

Charles.

blast alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

the number of results is lower than what I expected

How many results do you expect? How do you know that the number of results you expect actually exist?

ADD REPLY
0
Entering edit mode

We are looking at the complete genome of roughly 20 000 bacterial species and we expect to find the queried gene in most of them.

It could be possible that the gene in question is not present in as many genome as expected (although it seems unlikely based on the biologist I'm working with). The main problem is that in the first round of blastn, I find ~8000 hits and in the second round (after removing the first ~8000 hits), I find ~9000 new hits.

I should also have mentioned that I get some perfect alignments in both rounds of blastn (pident 100 over all the query length).

Maybe I could rephrase my questions as:

Why all the perfect alignements are not returned in the first round of blastn?

ADD REPLY
0
Entering edit mode

do you remove the 8000 hits from your query set or from the DB set?

ADD REPLY
0
Entering edit mode

Query set is a single sequence corresponding to a gene of interest.

I removed the 8000 hits from the first round of blastn from the DB set to see if blastn would find new results in a second round.

The fact that the second blastn returned multiple perfect alignement in this second round was unexpected for me. I expected all the perfect hits to be found in the first round.

ADD REPLY
0
Entering edit mode

Why all the perfect alignements are not returned in the first round of blastn?

BLAST cannot do that - it solves the alignment problem optimally, and so there will always be compromise. 100% accurate results as a target is computationally expensive and is seldom the requirement that BLAST is used for.

ADD REPLY
0
Entering edit mode

try:

blastn -task blastn -db db.fna -max_target_seqs 1000000 -outfmt 6 -query query.fna > blast.out
ADD REPLY
0
Entering edit mode

It returns ~8000 results.

ADD REPLY
0
Entering edit mode

What about this one:

blastn -task blastn -db db.fna -num_descriptions 20000 -num_alignments 20000 -outfmt 6 -query query.fna > blast.out
ADD REPLY
1
Entering edit mode
3.4 years ago

You cannot use blastn for this purpose. Your nucleotide sequence is not well conserved. To find all possible orthologs you have to compare protein sequences. Use either translated protein seq with tblastn or tblastx with a nucleotide query or protein queries on a protein database from annotated proteome with blastp.

ADD COMMENT
0
Entering edit mode

I'm working with genomes assemblies from the same specie. This is why I'm expecting my query to be found in all assembly with minor difference between strains.

My impression is that the problem is more with the blastn program than the biological sequences, but maybe I'm missing something.

ADD REPLY
0
Entering edit mode

We are looking at the complete genome of roughly 20 000 bacterial species and we expect to find the queried gene in most of them.

That sounds like a contradiction. Please clarify

ADD REPLY

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6