Multiple taxonomy hits per OTU ID in VSEARCH despite --top_hits_only (90% ID after QIIME2 clustering at 97%)
0
0
Entering edit mode
3 months ago

Hi all,

I clustered my ASVs into OTUs at 97% similarity using QIIME2's cluster-features-de-novo:

qiime vsearch cluster-features-de-novo \
  --i-table asv_table.qza \
  --i-sequences asv_sequences.qza \
  --p-perc-identity 0.97 \
  --o-clustered-table otu_table_97.qza \
  --o-clustered-sequences otu_sequences_97.qza

For taxonomy assignment of these OTUs, I used standalone VSEARCH (not QIIME2) with the following parameters:

vsearch --usearch_global dna-sequences.fasta \
  --db SILVA_138.1_SSURef_NR99_tax_silva_fixed.fasta \
  --id 0.90 \
  --maxaccepts 20 \
  --maxrejects 0 \
  --output_no_hits \
  --blast6out amf_taxonomy_vsearch_max20.csv \
  --top_hits_only

However, even with --top_hits_only, I’m getting multiple hits per OTU ID in the blast6out result. I suspect this happens when multiple reference sequences match equally well (i.e., identical percent identity and bit score).

Questions: Is it expected that --top_hits_only still yields multiple hits when --maxaccepts is set to 20? I thought it would limit to one.

What is the best practice to retain only one best taxonomy hit per OTU in this case? Should I manually select?

Any suggestions for handling the long SILVA taxonomy strings with many levels? How can I clean or truncate them to a 7-level (kingdom to species) hierarchy in R?

Would doing taxonomy assignment using QIIME2’s classify-consensus-vsearch help avoid this issue, or does it behave similarly?

Any insight from your experience or pipeline practices would be highly appreciated.

vsearch • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 4343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6