Question: kraken2 different bacteria read counts on custom database
gravatar for bangbangphil2
20 months ago by
bangbangphil20 wrote:


Using kraken2, I did two classification tasks on the same sample: one using kraken2 standard database which includes homo sapiens, and the other using a custom database built by kraken2 that doesn't contains homo sapiens. Of the 29 millions reads, I get 16k reads on bacteria when using the standard database (with HS). When using the custom database without HS I get 1.06 million reads on bacteria.

My question is: what should I believe? There is clearly a human contamination in the sample, but when I ignore it in classification I get much more bacterial reads, and much more diversity too. But I am tempted to put my money on the classification using bacteria and human, as for me the read count difference must come from some sequence homology between human and bacteria, where some reads are favored to human when both targets are available.

What do you think? Does my impression fits with kraken's internal alignment algorithm?



dna-seq metagenomics kraken2 • 1.5k views
ADD COMMENTlink modified 15 months ago by ilyzdd10 • written 20 months ago by bangbangphil20

I'm curious what happens if you remove the mitochondrial DNA from the reference and re-run. I had a similar problem which I solved, see here: Kraken2 database curation might not be a problem with human though (except for the mitochondria)

ADD REPLYlink written 20 months ago by Asaf8.5k

thanks for the info. I did try with a new database not containing human mitochondrial DNA, but the count doesn't change much ...

ADD REPLYlink written 20 months ago by bangbangphil20
gravatar for ilyzdd
15 months ago by
ilyzdd10 wrote:


Have you decontaminated the raw reads before using Kraken2, like using Bowtie2 or BWA to mapping all the reads to the Human reference genome and excluding all the reads that can map? If the sample is from a human stool, in this way, it can make the reads contain fewer human reads.

ADD COMMENTlink written 15 months ago by ilyzdd10
gravatar for ctseto
20 months ago by
ctseto280 wrote:

If you like reading kraken --output files, for each contig You might have Bacteria:1 9606:12 0:1000 (where 0 is unclassified) Eliminate the host 9606 and it turns to Bacteria:1 0:1012, the vote switches to Bacteria Eliminate the host 9606 and it turns to Bacteria:N 0:1000+(12-N), the vote switches to Bacteria

I suspect one needs a human "sink" to assure that Kmers have a place to go, vs traversing LCA and ending up somewhere else that they shouldn't be? However, I find it hard to believe that the difference is 16k vs 1,006k bacteria reads with and without human?

In the end, check the first few lines of your kraken.out from both databases and see how the kmer assignments look.

ADD COMMENTlink written 20 months ago by ctseto280

Looking at the output files I see things like this :

from database with human:

C   NB502083:48:HKTMTAFXY:1:11101:19388:1052    Homo sapiens (taxid 9606)   76|76   9606:3 131567:5 9606:1 131567:1 9606:5 131567:3 9606:24 |:| 9606:21 2759:5 9606:5 2759:6 9606:5

from database without human

C   NB502083:48:HKTMTAFXY:1:11101:19388:1052    1280    76|76   0:3 1280:5 0:1 1280:1 0:5 1280:3 0:24 |:| 0:3 1280:5 0:1 1280:1 0:5 1280:3 0:24

taxon 1280 is Staphylococcus aureus, but there are many kmer not in database '0:'. Taking a closer look at the output I see that to be unclassified both reads must be completely absent from kmer db. I guess from this observation that one is better with the most complete kmer database.

ADD REPLYlink written 20 months ago by bangbangphil20

My interpretation here is that db two without human classifies human as "0" (unclassified. It seems it is 131567 /or/ 1280, depending on the database; at least for Read1 In read2 it is either 9606 or 2759, in your database sans human 0 or 1280. In read 2 the first 21 kmers are human; without human in the db it is a mix of 0 and 1280 and ends with a bunch of unknowns.

In this case I would probably lean towards your first database,

ADD REPLYlink written 15 months ago by ctseto280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour