Why does Kraken2 not produce taxon percentages with --report flag?
2
0
Entering edit mode
19 months ago
DNAngel ▴ 240

Using kraken2 version 2.1.1

When using the flag --report, I should be getting an output file that gives taxon percentages in the first column for the taxon hits.

Something like below (taken from https://genomics.sschmeier.com/ngs-taxonomic-investigation/index.html). Column1 are the percentages for that category (last column), followed by number of reads in the clade rooted at the taxon, then the number of reads assigned to that taxon, classification rank (i.e. U = unclassified, I also have no idea what R is as it is not explained in the documents), then the NCBI taxonomy ID, and finally the category with scientific name.

83.56  514312  514312  U       0       unclassified
16.44  101180  0       R       1       root
16.44  101180  0       R1      131567    cellular organisms
16.44  101180  2775    D       2           Bacteria
13.99  86114   1       D1      1783270       FCB group
13.99  86112   0       D2      68336           Bacteroidetes/Chlorobi group
13.99  86103   8       P       976               Bacteroidetes
13.94  85798   2       C       200643              Bacteroidia
13.94  85789   19      O       171549                Bacteroidales
13.87  85392   0       F       815                     Bacteroidaceae


Instead my --report output gives me something like the following:

d__Bacteria     7879
d__Bacteria|p__Proteobacteria   7783
d__Bacteria|p__Proteobacteria|c__Alphaproteobacteria    4240
d__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rickettsiales   4182
d__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rickettsiales|f__Anaplasmataceae        4181
d__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rickettsiales|f__Anaplasmataceae|g__Wolbachia   4181


I also tried --output which gives the following:

U       A00977:183:HLLKYDSXY:3:2426:20166:2315  unclassified (taxid 0)  146|146 0:112 |:| 0:112
C       A00977:183:HLLKYDSXY:3:2426:29613:2550  Wolbachia (taxid 953)   146|145 0:112 |:| 2591635:2 0:41 1845000:2 0:17 169402:1 0:48
C       A00977:183:HLLKYDSXY:3:2426:5538:2566   Klebsiella pneumoniae (taxid 573)       146|146 0:15 28216:2 0:18 1236:8 767434:2 2:5 0:33>
U       A00977:183:HLLKYDSXY:3:2426:4318:2832   unclassified (taxid 0)  146|146 0:112 |:| 0:112


So nothing in Kraken2 github explains how to get the taxon percentages and the outputs are all different with no clear updates on why the outputs are different from their tutorials....can anyone help me understand this? I just want to gather taxon identification and abundance for my samples using Kraken2.

Thank you!

kraken2 • 1.9k views
0
Entering edit mode

Which is the exact command that you're using with kraken2?

0
Entering edit mode
kraken2 --db krakendb --use-names --use-mpa-style --report-zero-counts --threads 4 \
--paired --report $SAMPLE.report.txt --report-minimizer-data \ --output$SAMPLE.classified.out $reads1$reads2

0
Entering edit mode

So I tried different flags and so far it appears to work as expected if I remove --use-names and --use-mpa-style. Now testing if it will work when I include --report-zero-counts.

0
Entering edit mode

I don't have enough experience with the software to help you further. My advice is for you to reach the developers with these doubts in their github repository by open a new issue.

Although I think you or someone with the same doubt already did it: kraken2 output does not produce a column of taxon percentages.

As I said I don't have enough experience with the software to help you further, though I think the issue may lie in the option --use-mpa-style, which as far I understand and according to the documentation produces a similar output to MetaPhlAn (citing): In addition, we also provide the option --use-mpa-style that can be used in conjunction with --report. This option provides output in a format similar to MetaPhlAn's output. The output with this option provides one taxon per line, with a lowercase version of the rank codes in Kraken 2's standard sample report format (except for 'U' and 'R'), two underscores, and the scientific name of the taxon (e.g., "d__Viruses"). The full taxonomy of each taxon (at the eight ranks considered) is given, with each rank's name separated by a pipe character (e.g., "d__Viruses|o_Caudovirales"). Following this version of the taxon's scientific name is a tab and the number of fragments assigned to the clade rooted at that taxon.

If you notice in the last sentence - following this version of the taxon's scientific name is a tab and the number of fragments assigned to the clade rooted at that taxon, you'll see that the output that you're getting is expected.

Perhaps since this option tries to mimic MetaPhlAn output, which includes percentages, but it prefers to include number of fragments instead which can be used to determine percentage anyway (although you need to confirm this because I don't know - it is just my interpretation after reading the documentation).

I hope this helps,

António

0
Entering edit mode

Yep! I got it to finally work when removing the --use-mpa-style. I think that change in format was interfering with the report output because the file output is very different.

1
Entering edit mode
19 months ago
DNAngel ▴ 240

For those wondering, I did find another flag called --report-minimizer-data which I think should do what I want. Will try it out and have to wait and see. Was not clear in the tutorial I was following but was mentioned in the github page for Kraken2.

0
Entering edit mode

It did not work. The output is identical if I used --report-minimizer-data or not. What is going on here???

1
Entering edit mode
11 months ago

with the --use-mpa-style option you are asking the tool to give you an output similar to MetaPhlAn and that's why you are getting that output format. Try the command without that option.

kraken2 --db krakendb --use-names --report-zero-counts --threads 4 \ --paired --report $SAMPLE.report.txt --report-minimizer-data \ --output$SAMPLE.classified.out $reads1$reads2