Kraken2 recovering only classified reads
11 weeks ago
SushiRoll ▴ 120

Hi everyone!

I'm running Kraken2 on different metagenomes (Illumina paired end). I'm currently using it this way:

kraken2 --output Sample_1 --paired Sample_1_R1.fastq  Sample_1_R2.fastq

With this I get everything running ok and get a single file with both classified and unclassified sequences. I would like to keep just the classified ones so I've tried:

kraken2 --classified-out Sample_1_classified --paired cseqs#.fq Sample_1_R1.fastq  Sample_1_R2.fastq

but it Kraken tells me

--paired requires positive and even number filenames

I'm not really sure what this means since I'm giving it two files.

On an unrelated matter but taking advantage of the question. I would like to run this analysis on several samples which should be easy doing a bash script but I haven't been able to find a way to merge the different outputs into one table to the use for example bracken or pavian on a single file. Does anyone now a way to do this?

Thanks a lot!!!

EDIT: I've come up with a not so elegant workaround:

awk -F "\t" '$1=="C"' Sample_1> Sample_1_Classified

I would still like to know if there's a better solution to this. Again, thanks a lot!

Kraken2 taxonomy classification • 342 views
11 weeks ago

You give it one filename for the classified output, but it needs two or a # sign. Your # sign is in the wrong spot. From the manual:

Usage of --paired also affects the --classified-out and --unclassified-out options; users should provide a # character in the filenames provided to those options, which will be replaced by kraken2 with "_1" and "_2" with mates spread across the two files appropriately. For example:
kraken2 --paired --classified-out cseqs#.fq seqs_1.fq seqs_2.fq

So in your case,

kraken2 --paired --classified-out Sample_1_classified#.fq  Sample_1_R1.fastq  Sample_1_R2.fastq
Ohh I get it now, I had read the manual but didn't quite understand what the cseqs#.fq meant, thanks for the clarification


