Question

kraken2 unclassified output does not work with paired reads?

0

Entering edit mode

3.2 years ago

DNAngel ▴ 250

Hi all,

Just wondering if anyone else ran into this issue (or if it is an issue at all). Kraken2 allows users to obtain their classified or unclassified reads separately using either ---classified-out or --unclassified-out, respectively.

Now, when using --classified-out with paired-end data (i.e. using --paired and then calling R1 and R2 files), I get my output as one file. However, this does not work with --unclassified-out and I do not know why that is the case. I get an unclassified file for each read sample (so a sample1.unclassified_1.out and a sample1.unclassified_2.out to correspond to R1 and R2 respectively).

Is there a way to modify my code (below) so I can just get one combined unclassified output. That way, when I blast the reads I won't be blasting the same thing twice.

My code:

for R1 in ${INPUTS} # inputs being the directory with my reads
do
      PREFIX=${R1%%_unmapped*}
      R2="${PREFIX}_unmapped_R2.fastq"
      kraken2 --threads 32 --db $DB --paired $R1 $R2 --unclassified-out ${PREFIX}.unclassified#.fastq
done

Thank you!

Kraken2 • 3.5k views

ADD COMMENT • link updated 3.2 years ago by Istvan Albert 101k • written 3.2 years ago by DNAngel ▴ 250

score 3 · Answer 1 · 2021-06-02

3

Entering edit mode

3.2 years ago

Istvan Albert 101k

Are you sure you get your classified outs in a single file? The manual states:

Usage of --paired also affects the --classified-out and --unclassified-out options; users should provide a # character in the filenames provided to those options, which will be replaced by kraken2 with "_1" and "_2" with mates spread across the two files appropriately. For example:
  kraken2 --paired --classified-out cseqs#.fq seqs_1.fq seqs_2.fq
will put the first reads from classified pairs in cseqs_1.fq, and the second reads from those pairs in cseqs_2.fq.

Then I am unsure why you want them merged in one file - it usually not what we want.

The sequences in the two pairs are usually different, so you would never be blasting the same thing twice.

If you want to combine the two files, concatenate them.

ADD COMMENT • link 3.2 years ago by Istvan Albert 101k

0

Entering edit mode

Hmm, I think I made an error then when using --classified-out I don't think I used the "#" classifier in the output files. So yes they should have produced two files. I just thought it was pointless blasting R1 and R2 reads as I can see they are giving me the same output for the same read pair..

ADD REPLY • link 3.2 years ago by DNAngel ▴ 250

1

Entering edit mode

most of the time R1 and R2 should classify/align the same way, the more interesting/concerning cases are those when the two reads align to different organisms

ADD REPLY • link 3.2 years ago by Istvan Albert 101k