Question: Diamond results non-specific compared to NCBI Web
I recently downloaded and set up the nr database from NCBI using Diamond. I ran my sequences through using the taxonomic information tags. Using the following command lines:

diamond makedb --in nr.gz --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -d nr diamond blastp -d /srv/scratch/nrDatabase/nr.dmnd -q COG0202.faa --more-sensitive -o matchesCOG0202 -f 102 --id 50 --query-cover 80 -b 25

A significant portion of my sequences were returned as having the NCBI Taxonomy ID '2', for bacteria. When I run those same sequences through NCBI Web Blastp they are returned with very specific hits. Such as 'Deltaproteobacteria bacterium HGW-Deltaproteobacteria-15'. Why would Diamond give me useless results when NCBI Web gives me specific and useful results, especially when they use the same database?

Thank you in advance for any help!

Diamond uses the LCA algorithm for taxonomic classification, which means that not only the top hit is used, but all hits within a 10% range of the best score. This can often lead to unspecific assignments. To get it more specific, use the --top parameter with a lower number, e.g. --top 0 would only use the best hit for the taxonomy assignment.

Thank you buchfink! Much appreciated.

