Question: Why the local blast and online blast produce different results?
2
gravatar for grayapply2009
3.3 years ago by
grayapply2009150
United States
grayapply2009150 wrote:

I downloaded the latest nt database from NCBI FTP and did a local blast against the database. All my five sequences are virus sequences. However, when I blast these sequences with the online blast tool (the one on NCBI), all these sequences are bacteria sequences (Berkhoderia).

Why the difference? Which one is more reliable?

results different blast • 2.7k views
ADD COMMENTlink modified 3.3 years ago by Juke-341.1k • written 3.3 years ago by grayapply2009150
1

Are you sure that you downloaded the correct database?

ADD REPLYlink written 3.3 years ago by Janake160

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

The above is where I downloaded the nt database.

ADD REPLYlink written 3.3 years ago by grayapply2009150
2
gravatar for Juke-34
3.3 years ago by
Juke-341.1k
Sweden
Juke-341.1k wrote:

Assuming you have exactly the same database online and locally, I had exactly the same problem as you. The problem stemmed of a difference of default parameters between the local blast and the online blast. Indeed, the "word size" parameter was different. It's something easy to check.

ADD COMMENTlink written 3.3 years ago by Juke-341.1k

Thanks for this solution @Juke-34. Indeed, the command line default parameter for blastn is 28, whereas the online default parameter is 11...

ADD REPLYlink written 15 months ago by tlorin230
1
gravatar for Siva
3.3 years ago by
Siva1.5k
United States
Siva1.5k wrote:

Can you post the full command you used for running the local blast? The default search algorithm for nucleotide BLAST at NCBI website is "megablast" whereas the default for the standalone is "blastn".

You can check the number of sequences in the 'nt' db you downloaded using 'blastdbcmd' and compare it with the number of sequences in the online 'nt' version (by clicking the ? next to the Database drop-down menu).

 

ADD COMMENTlink written 3.3 years ago by Siva1.5k

blastn -query test_query.fa -db nt.00/nt.00 -task blastn -dust no -outfmt "6 qseqid stitle staxids scomnames sscinames sskingdoms pident" -max_target_seqs 1

The online database (nr/nt) description is "The nucleotide collection consists of GenBank+EMBL+DDBJ+PDB+RefSeq sequences". It is a mixed database compared to the nt database I downloaded.

ADD REPLYlink written 3.3 years ago by grayapply2009150
1

There could be at least two reasons for the differences you mentioned in your original post.

1. The parameters you use for the local BLAST and the online BLAST are different.

 You are using 'blastn' for the local BLAST, but the default algorithm for online one is 'megablast'. Also, you disabled filtering (-dust no), but it is enabled by default in the online BLAST. Did you modify the parameters in the online BLAST to match the command you posted?

2. The BLAST databases you are searching against are different.

Right now, you are searching against only one of the 26 subsets of 'nt' database.. I hope you read this in the FTP Readme file

Large databases are formatted in multiple one-gigabyte volumes, which are named 
using the basename.##.tar.gz convention. All volumes with the same base name are 
required. An alias file is provided to tie individual volumes together so that 
the database can be called using the base name (without the .nal or .pal 
extension). For example, to call the est database, simply use "-db est" option 
in the command line (without the quotes). 

You need to download all the nt.#.tar.gz files, where # is 00 to 25 and unzip and untar all these files in one directory. Then, you can run BLAST with the option "-db nt".


 

ADD REPLYlink written 3.3 years ago by Siva1.5k

Thank you for the reply, Siva. Actually, I downloaded all 26 files and unzipped them to my computer. The reason I blast against only the nt.00 folder is this folder contains an alias file (index file) that calls the information stored in all 26 folders.

I didn't change anything in the online blast. How do I just blast against the nt database online? It looks like the megablast is the only choice online.

By the way, what does the filter do in the blast? What is the effect of disabling it?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by grayapply2009150
1

I am sorry for assuming that you did not download all the 26 files (the same alias file "nt.nal" is present in all the 26 directories) . But you are using only one of the 26 files. You need to use only the base name (-db nt) to use all the 26 files. If BLAST complains that the database "nt" is not found, either you need to put all the unzipped files in one directory or copy the alias file to the same directory where you have the 26 directories.

There are three choices for the algorithm under "Program Selection": megablast, discontiguous megabalst and blastn. You can select 'blastn'.

Filtering masks the low complexity regions in your query sequence. If you disable filtering, you will get hits that share only the low complexity regions which are not very useful. You can read more about this option here.

ADD REPLYlink written 3.3 years ago by Siva1.5k

Hey, Siva. You are exactly right. This time I blasted against the entire nt database with the commandline "blastn -query test_query.fa -db nt/nt -task blastn -dust no -outfmt "6 qseqid stitle staxids scomnames sscinames sskingdoms pident" -max_target_seqs 1". And the results are same now.

I cannot believe some guy online misled me so much two months ago, who told me to blast against the first volume as "it contains the index file". I've been doing the wrong thing the entire semester.

Many thanks to you.

By the way, how do you blast against multiple databases simultaneously such as, nr, nt, swissprot...

Another thing is when you use blastn online it actually blasts against nt/nr which may lead to the different results as I only blast against nt database on my computer. How do I deal with it?

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by grayapply2009150
1

You are welcome. To search against multiple BLAST databases, just concatenate the database names separated by space

-db "nr swissprot"

 

ADD REPLYlink written 3.3 years ago by Siva1.5k

What if I want to blast against nt and nr?
 

ADD REPLYlink written 3.3 years ago by grayapply2009150

You cannot. 'nt' is a nucleotide sequence database and 'nr' is a protein sequence database.

ADD REPLYlink written 3.3 years ago by Siva1.5k

OK, I'll just do it separately. Thank you, Siva. You saved me.

ADD REPLYlink written 3.3 years ago by grayapply2009150
1

I think you should use '-db nt' as it will recognize all the sub files of nt.

ADD REPLYlink written 3.3 years ago by CikLa90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1613 users visited in the last hour