Question: Local Blast Discrepancies
0
gravatar for jturner2
4 months ago by
jturner20
jturner20 wrote:

I am attempting to setup a local BLAST environment , and was doing some test queries on both the local blast environment and the online web server, when I noticed some dramatic discrepancies. On both the web server and the local blast I was using word size 3, gap open 11, gap extend 1, an e-value threshold of 10.0 and BLOSUM62 scoring matrix. I got 2 total results on the local search, and 25 results on the web server. The version of the BLAST DB is less than a month old, and some of the sequences that were found in the web server results, but not the local results, are included in it.

Specifically, I was test BLASTing this DNA Polymerase into humans, using the new taxids argument locally and the organism tag (taxid 9606) on the web server.

4XVI_A Chain A, Dna Polymerase Nu [Homo sapiens]KKHFCDIRHLDDWAKSQLIEMLKQAAALVITVMYTDGSTQLGADQTPVSSVRGIVVLVKRQAEGGHGCPDAPACGPVLEGFVSDDPCIYIQIEHSAIWDQEQEAHQQFARNVLFQTMKCKCPVICFNAKDFVRIVLQFFGNDGSWKHVADFIGLDPRIAAWLIDPSDATPSFEDLVEKYCEKSITVKVNSTYGNSSRNIVNQNVRENLKTLYRLTMDLCSKLKDYGLWQLFRTLELPLIPILAVMESHAIQVNKEEMEKTSALLGARLKELEQEAHFVAGERFLITSNNQLREILFGKLKLHLLSQRNSLPRTGLQKYPSTSEAVLNALRDLHPLPKIILEYRQVHKIKSTFVDGLLACMKKGSISSTWNQTGTVTGRLSAKHPNIQGISKHPIQITTPKNFKGKEDKILTISPRAMFVSSKGHTFLAADFSQIELRILTHLSGDPELLKLFQESERDDVFSTLTSQWKDVPVEQVTHADREQTKKVVYAVVYGAGKERLAACLGVPIQEAAQFLESFLQKYKKIKDFARAAIAQCHQTGCVVSIMGRRRPLPRIHAHDQQLRAQAERQAVNFVVQGSAADLCKLAMIHVFTAVAASHTLTARLVAQIHDELLFEVEDPQIPECAALVRRTMESLEQVQALELQLQVPLKVSLSAGRSWGHLVPLQ

On the web server I got these 25 results:

DNA polymerase nu [Homo sapiens]    1386    1386    100%    0.0 100.00% NP_861524.2
gb|AAN52116.1|  DNA polymerase N [Homo sapiens] 1386    1386    100%    0.0 100.00% AAN52116.1
pdb|4XVI|A  Chain A, Dna Polymerase Nu [Homo sapiens]   1381    1381    100%    0.0 100.00% 4XVI_A
gb|EAW82538.1|  polymerase (DNA directed) nu, isoform CRA_a [Homo sapiens]  1363    1363    100%    0.0 98.65%  EAW82538.1
gb|AAD02338.1|  putative DNA polymerase [Homo sapiens]  969 969 73% 0.0 97.36%  AAD02338.1
dbj|BAG64670.1| unnamed protein product [Homo sapiens]  815 815 57% 0.0 100.00% BAG64670.1
dbj|BAD18421.1| unnamed protein product [Homo sapiens]  561 561 54% 0.0 80.83%  BAD18421.1
pdb|4X0Q|A  Chain A, Dna Polymerase Theta [Homo sapiens]    238 238 76% 1e-67   29.27%  4X0Q_A
pdb|4X0P|A  Chain A, Dna Polymerase Theta [Homo sapiens]    238 238 76% 2e-67   29.27%  4X0P_A
gb|AAR08421.2|  DNA polymerase theta [Homo sapiens] 238 238 76% 2e-65   29.27%  AAR08421.2
gb|EAW79513.1|  polymerase (DNA directed), theta, isoform CRA_d [Homo sapiens]  238 238 76% 2e-65   29.27%  EAW79513.1
ref|NP_955452.3|    DNA polymerase theta [Homo sapiens] 238 238 76% 2e-65   29.27%  NP_955452.3
dbj|BAD93104.1| DNA polymerase theta variant [Homo sapiens] 238 238 76% 2e-65   29.27%  BAD93104.1
gb|EAW79510.1|  polymerase (DNA directed), theta, isoform CRA_a [Homo sapiens]  238 238 76% 2e-65   29.27%  EAW79510.1
gb|EAW79511.1|  polymerase (DNA directed), theta, isoform CRA_b [Homo sapiens]  237 237 76% 3e-65   29.27%  EAW79511.1
ref|XP_011510650.1| DNA polymerase theta isoform X3 [Homo sapiens]  237 237 76% 3e-65   29.06%  XP_011510650.1
gb|AAD05272.1|  DNA polymerase eta [Homo sapiens]   236 236 75% 4e-65   29.20%  AAD05272.1
ref|XP_011510649.1| DNA polymerase theta isoform X1 [Homo sapiens]  237 237 76% 4e-65   29.06%  XP_011510649.1
ref|XP_016861054.1| DNA polymerase theta isoform X4 [Homo sapiens]  236 236 76% 5e-65   29.06%  XP_016861054.1
ref|XP_011510656.1| DNA polymerase theta isoform X6 [Homo sapiens]  236 236 76% 6e-65   29.06%  XP_011510656.1
gb|AAK39635.1|  DNA polymerase theta [Homo sapiens] 230 230 76% 7e-63   29.27%  AAK39635.1
gb|AAC33565.1|  DNA polymerase theta [Homo sapiens] 229 229 76% 9e-63   29.27%  AAC33565.1
ref|XP_011510645.1| DNA polymerase theta isoform X2 [Homo sapiens]  221 221 76% 8e-60   27.83%  XP_011510645.1
emb|CAI56770.1| hypothetical protein [Homo sapiens] 92.0    92.0    39% 6e-18   27.15%  CAI56770.1
ref|XP_011510654.1| DNA polymerase theta isoform X5 [Homo sapiens]  62.8    62.8    22% 6e-09   27.67%  XP_011510654.1

Whereas in local BLAST I only got these two results

 BLASTP 2.9.0+
# Query: 4XVI_A Chain A, Dna Polymerase Nu [Homo sapiens]
# Database: nr_v5
# Fields: % query coverage per subject, subject id, subject sci names
# 2 hits found
100     gb|AAN52116.1|  Homo sapiens
76  gb|AAK39635.1|  Homo sapiens
# BLAST processed 1 queries

Does anyone have any Ideas as to what could be causing this?

EDIT:

The command I used was:

blastp -query HomoTest -db nr_v5 -out v53REsult -outfmt "7 qcovs sseqid sscinames" -max_target_seqs 100 -taxids 9606 -evalue 10.0

Apart from making sure of the options I have specified, the web server was on its default settings.

EDIT 2:

After a clean nr_v5 install, the issue has been resolved, thank you to everyone who answered this post.

blast • 196 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by jturner20

What is the exact command you used to run the local BLASTP?
And was the webpage BLASTP all default otherwise?

Without knowing what flags you used it is difficult to tell.

ADD REPLYlink modified 4 months ago • written 4 months ago by deprekate0

I have added that to the post.

ADD REPLYlink written 4 months ago by jturner20

have you looked around on biostar for similar posts? I seem to remember this kind of issues have been raised and 'processed' before.

One thing already is that the web version likely does not use the -max_target_seqs 100 parameter (which is a cause of lots of "confusion")

ADD REPLYlink written 4 months ago by lieven.sterck6.1k

I have looked around biostars, and most of what I found was differences in the word size or a difference in the open and extend gap costs. As for the max target seqs, the webpage defaults to 100 results, does it do this in a different manner than the local version?

ADD REPLYlink written 4 months ago by jturner20

that is not clearly documented but we tend to assume so indeed.

try it with running it without the -max_targets parameter and then filter in post processing

ADD REPLYlink written 4 months ago by lieven.sterck6.1k

Thanks for the quick reply. I have completely removed the option from my local blast command, but I still got identical results to when it was there, which is incredibly confusing.

ADD REPLYlink written 4 months ago by jturner20

are you sure everything worked as expected. From the sample output you provided I can spot that it is not corresponding to the output format you requested in your blastp commandline.

DB was also different version I understood, no?

ADD REPLYlink written 4 months ago by lieven.sterck6.1k
0
gravatar for genomax
4 months ago by
genomax73k
United States
genomax73k wrote:

I ran your query against a local copy of nr_v5 database using your exact command above and got 25 hits. Do you have an incomplete local copy of nr?

# BLASTP 2.9.0+
# Query: test
# Database: nr_v5
# Fields: % query coverage per subject, subject id, subject sci names
# 25 hits found
100 ref|NP_861524.2|    Homo sapiens
100 gb|AAN52116.1|  Homo sapiens
100 pdb|4XVI|A  Homo sapiens
100 gb|EAW82538.1|  Homo sapiens
74  gb|AAD02338.1|  Homo sapiens
ADD COMMENTlink written 4 months ago by genomax73k

I downloaded the nr_v5 database from the ftp site found here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/v5/, but I will re-download and try again. Thanks for running that test. Edit: After a clean nr_v5 install, the issue has been resolved, thank you to everyone who answered this post.

ADD REPLYlink modified 4 months ago • written 4 months ago by jturner20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1892 users visited in the last hour