If you, like me, work with metagenomic data, you probably have used
kaiju2table in the past. It's a tool provided with the Kaiju source code. It produces tsv tables that can easily be handled later on, for example for plotting.
The input data is the classic output format of Kaiju and also of Kraken:
C A00700:50:HF7LGDRXX:1:1101:1000:10144#CCGGCATCATCTACGA 1578 100 1578:66 C A00700:50:HF7LGDRXX:1:1101:1000:19225#CCGGCATCATCTACGA 186802 100 0:11 186802:55 C A00700:50:HF7LGDRXX:1:1101:1000:23234#CCGGCATCATCTACGA 1578 100 1578:66
And can be from as many files as you need, which will be combined in one file containing percentages, like this:
file percent reads taxon_id taxon_name F14_A_R1.s.out 59.89815343509682 7161673 0 Unclassified F14_A_R1.s.out 1.080231644647389 129157 1301 Streptococcus F14_A_R1.s.out 2.129960840275143 254667 1350 Enterococcus F14_A_R1.s.out 1.3252716093792982 158455 1485 Clostridium F14_A_R1.s.out 6.908532882384414 826013 1578 Lactobacillus F14_A_R1.s.out 1.880613565083921 224854 204475 Gemmiger F14_A_R1.s.out 3.163456075511585 378236 572511 Blautia F14_A_R1.s.out 1.4769725746433902 176593 946234 Flavonifractor F14_A_R1.s.out 1.0168598167829042 121580 1017280 Pseudoflavonifractor
As far as I could find, there is no such tool made for Kraken2, which is perhaps more used than Kaiju as a tool. You could, of course, try to use
kaiju2table with the kraken results, but you would have to install Kaiju to have it.
Hence, for my own convenience I have made a tool called kraken2table that converts the
*.out files produced by Kraken2 (mpa format) to
*.tsv tables that resemble those produced by
You can find it here:
It depends on:
The options are quite simple:
usage: kraken2table [-h] -i [INPUT_FILES [INPUT_FILES ...]] -o OUTPUT_FILE [-p THREADS] [-r RANK] [-m MIN_FRAC] [-c MIN_COUNT] [-u] optional arguments: -h, --help show this help message and exit -i [INPUT_FILES [INPUT_FILES ...]], --input-files [INPUT_FILES [INPUT_FILES ...]] Name of input files (SPACE-separated). -o OUTPUT_FILE, --output-file OUTPUT_FILE Name of output file. -p THREADS, --threads THREADS Number of parallel threads -r RANK, --rank RANK Taxonomic rank to be output, all lowercase (Default: species) -m MIN_FRAC, --min-frac MIN_FRAC Number in [0, 100], denoting the minimum required percentage for the taxon (except viruses) to be reported (default: 0.0) -c MIN_COUNT, --min-count MIN_COUNT Integer number > 0, denoting the minimum required number of reads for the taxon (except viruses) to be reported (default: 0) -u, --exclude-unclassified Unclassified reads are not counted for the total reads when calculating percentages for classified reads.