Why do people say it doesn't matter whether you run unassembled or assembled reads through Kraken?
1
1
Entering edit mode
4 weeks ago
trkfs ▴ 10

I'm new to Kraken and I've come across 2 forum posts (link 1, link 2) that mention that it doesn't matter whether your reads are assembled or not when you run it through the Kraken classifier, with one commenter saying that it's better if you run raw reads through it. Does it have something to do with Kraken's k-mer based approach with classification?

assembly unassembled-reads Kraken • 637 views
ADD COMMENT
2
Entering edit mode
29 days ago
dthorbur ★ 2.9k

There are a few things brought up in the links you provided that is relevant here.

  1. Kraken2 is k-mer based, so it doesn't matter if you're inputting a single contiguous sequence or lots of individual sequences. Your overall taxonomic assignment similar. You will have different percentages since you collapse high coverage regions into a single sequence.

  2. If you use assembled sequences then you may not be classifying any read that was not assembled.

In my experience, it really depends on what you are using Kraken2 for. I used it to estimate contamination when we were QCing samples from collaborators that often didn't follow protocols very well. And there, it was very useful to identify likely contamination, and remove it before moving forward with any analyses.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! In your opinion, in what scenario would it not be appropriate to use Kraken2 without assembly?

ADD REPLY
0
Entering edit mode

What do you intend to use it for? For me, I use it for contamination quantification and removal. But there are plenty of use cases for using it on unassembled reads. Easier to see your intended use case and offer my opinion/advice than try and list every possible reason.

ADD REPLY
0
Entering edit mode

I was using it for a project where I'm trying to see the difference in abundance of SCFA-producing bacteria across microbiomes of different samples of different disease states. This is one pipeline I'm using besides others for functional analysis.

ADD REPLY
0
Entering edit mode

I'm not all that familiar with your field of research, but it would seem counterproductive to try and assemble reads of a microbial community IMO. You'd lose the ability to compare relative depths of assignments, and end up with more of a presence/absence dataset than abundance. Unless you mean species richness as abundance. But again, this is not my field, so I may be missing vital information.

ADD REPLY
0
Entering edit mode

Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6