Diamond results non-specific compared to NCBI Web
1
0
Entering edit mode
2.4 years ago
gwrathe • 0

Hello,

I recently downloaded and set up the nr database from NCBI using Diamond. I ran my sequences through using the taxonomic information tags. Using the following command lines:

diamond makedb --in nr.gz --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -d nr diamond blastp -d /srv/scratch/nrDatabase/nr.dmnd -q COG0202.faa --more-sensitive -o matchesCOG0202 -f 102 --id 50 --query-cover 80 -b 25

A significant portion of my sequences were returned as having the NCBI Taxonomy ID '2', for bacteria. When I run those same sequences through NCBI Web Blastp they are returned with very specific hits. Such as 'Deltaproteobacteria bacterium HGW-Deltaproteobacteria-15'. Why would Diamond give me useless results when NCBI Web gives me specific and useful results, especially when they use the same database?

Thank you in advance for any help!

blast ncbi diamond blastp • 894 views
ADD COMMENT
0
Entering edit mode

I know you posted this 2 years ago but how did you get the prot.accesion2taxis.gz file and the nodes.dmp files to build the database? I'm trying to build a diamond nr database as well.

ADD REPLY
4
Entering edit mode
2.4 years ago
buchfink ▴ 170

Diamond uses the LCA algorithm for taxonomic classification, which means that not only the top hit is used, but all hits within a 10% range of the best score. This can often lead to unspecific assignments. To get it more specific, use the --top parameter with a lower number, e.g. --top 0 would only use the best hit for the taxonomy assignment.

ADD COMMENT
0
Entering edit mode

Thank you buchfink! Much appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 2088 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6