kraken2 unclassified output does not work with paired reads?
Entering edit mode
22 months ago
DNAngel ▴ 240

Hi all,

Just wondering if anyone else ran into this issue (or if it is an issue at all). Kraken2 allows users to obtain their classified or unclassified reads separately using either ---classified-out or --unclassified-out, respectively.

Now, when using --classified-out with paired-end data (i.e. using --paired and then calling R1 and R2 files), I get my output as one file. However, this does not work with --unclassified-out and I do not know why that is the case. I get an unclassified file for each read sample (so a sample1.unclassified_1.out and a sample1.unclassified_2.out to correspond to R1 and R2 respectively).

Is there a way to modify my code (below) so I can just get one combined unclassified output. That way, when I blast the reads I won't be blasting the same thing twice.

My code:

for R1 in ${INPUTS} # inputs being the directory with my reads
      kraken2 --threads 32 --db $DB --paired $R1 $R2 --unclassified-out ${PREFIX}.unclassified#.fastq

Thank you!

Kraken2 • 2.1k views
Entering edit mode
21 months ago

Are you sure you get your classified outs in a single file? The manual states:

Usage of --paired also affects the --classified-out and --unclassified-out options; users should provide a # character in the filenames provided to those options, which will be replaced by kraken2 with "_1" and "_2" with mates spread across the two files appropriately. For example:

  kraken2 --paired --classified-out cseqs#.fq seqs_1.fq seqs_2.fq

will put the first reads from classified pairs in cseqs_1.fq, and the second reads from those pairs in cseqs_2.fq.

Then I am unsure why you want them merged in one file - it usually not what we want.

The sequences in the two pairs are usually different, so you would never be blasting the same thing twice.

If you want to combine the two files, concatenate them.

Entering edit mode

Hmm, I think I made an error then when using --classified-out I don't think I used the "#" classifier in the output files. So yes they should have produced two files. I just thought it was pointless blasting R1 and R2 reads as I can see they are giving me the same output for the same read pair..

Entering edit mode

most of the time R1 and R2 should classify/align the same way, the more interesting/concerning cases are those when the two reads align to different organisms


Login before adding your answer.

Traffic: 2261 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6