Using kraken2, I did two classification tasks on the same sample: one using kraken2 standard database which includes homo sapiens, and the other using a custom database built by kraken2 that doesn't contains homo sapiens. Of the 29 millions reads, I get 16k reads on bacteria when using the standard database (with HS). When using the custom database without HS I get 1.06 million reads on bacteria.
My question is: what should I believe? There is clearly a human contamination in the sample, but when I ignore it in classification I get much more bacterial reads, and much more diversity too. But I am tempted to put my money on the classification using bacteria and human, as for me the read count difference must come from some sequence homology between human and bacteria, where some reads are favored to human when both targets are available.
What do you think? Does my impression fits with kraken's internal alignment algorithm?