Question

Bracken tool - do genus abundances need to be analysed on genus level or species/strain level?

0

Entering edit mode

2.9 years ago

Raphaela ▴ 10

Hi everyone,

in the scope of my research project I am analysing faecal gut microbiome samples for their abundances of lactobacillaceae species. For this purpose I ran Kraken2 over the samples, and consequentially Bracken (on species level) to estimate the relative abundances.

When parsing the files I did, however, realise that compared to the Kraken2 report, the Bracken report contained way less species and cut out strains all together. When re-running Bracken on strain level, lots of of species were missing in the report, and to confirm the hypothesis I ran Bracken on genus level and got different results again.

Furthermore, when running mixed linear models with all three kinds of report files I found very different results, basically indicating that for each individual level (genus, species, strain), associations were highest for the group that was analysed for.

My question now is: Do I have to look at the strain level bracken files when wanting to associate strains, species level bracken files when wanting species associations and genus files for genus associations,

OR do I use the smallest possible level (e.g. strain) and look at genus and species associations on strain level abundances?

thank you for your help.

Best, Rapha

kraken microbiome levels metagenomics bracken • 1.2k views

ADD COMMENT • link updated 2.9 years ago by colindaven 6.3k • written 2.9 years ago by Raphaela ▴ 10

score 0 · Answer 1 · 2021-05-24

I would use more than one classifier before believing the results of any one. Be skeptical, and use an intersection of results.

Metaphlan is quite good, as is Kraken-uniq. We also wrote Wochenende https://github.com/MHH-RCUG/Wochenende, Centrifuge is passable as well in my tests. Kraken2 did not do well in our tests.

All have their problems, use with caution (as you've already rightly pointed out with the Bracken post processing).

Generally, I would be very, very hesitant to look at strain level abundances. Short reads cover very little of the genome (metagenome, not 16S, correct ?) and strains are typically very, very similar. It doesn't follow that most reads are diagnostically useful at strain level.