Hello everyone,
I’m working on a 16S rRNA microbiome project using Kraken2 with the SILVA database for taxonomic classification. After classification, I apply Bracken to estimate species-level relative abundances, and I’m using the alpha_diversity.py script from KrakenTools to compute alpha diversity indices (Fisher, Shannon, Simpson, Inverse Simpson, and Berger-Parker).
I chose Kraken2 over tools like DADA2 or QIIME2 based on a recent benchmark study suggesting that Kraken2 offers higher accuracy for 16S data in certain contexts.
My question is about how to handle downstream analysis, especially:
Since Kraken2 doesn’t generate OTUs or ASVs, I’m unsure whether rarefaction is still needed before calculating alpha diversity.
My samples range between 25,000 and 50,000 total reads. Can I trust the diversity metrics computed by KrakenTools without rarefaction or normalization?
My preprocessing pipeline so far only includes Fastp for quality filtering. Should I include additional steps (e.g., chimera removal, filtering low-abundance taxa)?
Would it be more appropriate to switch to a DADA2/QIIME2-based pipeline for downstream diversity analysis?
I’d really appreciate insights from anyone who has worked with Kraken2/Bracken for 16S data, especially regarding best practices for diversity analysis and normalization.
Thanks in advance!