Question

Using kraken2 to fish out only 16S reads

0

Entering edit mode

23 months ago

Rezenman • 0

Hey all,

I have metagenomics data from several samples, from which I want to fish out only 16S reads. I thought about using kraken2 to match reads to the silva database and only keep classified reads considering them as 16S reads. The only problem I have is that I am not sure at what threshold to set the confidence score. I would love to get some advice from the experienced community in here. My reads are 151bp paired end

if anyone has another useful tool for acheiving the same goal I would love to hear

Thanks a lot ! Shahar

microbiome metagenomics 16S Deep-sequencing • 1.7k views

ADD COMMENT • link updated 23 months ago by Mensur Dlakic ★ 27k • written 23 months ago by Rezenman • 0

0

Entering edit mode

I'm not sure kraken is the right tool for that. Why not use any other aligner like bwa against Silva database?

ADD REPLY • link 23 months ago by Asaf 10k

0

Entering edit mode

Hey, we have tried to do it using other alignment tools but it seems to be also problematic when setting the identity threshold, So we wanted to try out some other tools that are more geared towards microbiome analysis and it seems that Kraken is widely used in the microbiome community so I thought it might be relevant. Thanks for the suggestion!

ADD REPLY • link 23 months ago by Rezenman • 0

0

Entering edit mode

What primers did you use? Why do you consider it necessary to filter them out?. Besides, what about unclassified/unknown 16S reads? it would cause biased normalization. I am not an expert in the subject, I just want to understand the rationale behind your analysis.

ADD REPLY • link 23 months ago by Buffo ★ 2.4k

0

Entering edit mode

This is data originated from metagenomics so no primers, just whole population sequencing. We want to filter all 16S reads for downstream analysis, and yes you are right about the unclassified reads, that's why I wanted suggestions regarding the confidence score thinking of keeping it relatively low to allow "weak" matches also. Thanks for the response !

ADD REPLY • link 23 months ago by Rezenman • 0

0

Entering edit mode

As per the tutorial here on ONT reads https://usegalaxy.org/training-material/topics/metagenomics/tutorials/nanopore-16S-metagenomics/tutorial.html, confident score 0.1 is used in classification, for silva db.

ADD REPLY • link 23 months ago by cpad0112 21k

0

Entering edit mode

Great, that might be a good start. However, in their case, they are working with full 16S sequences and we have short reads from whole population sequencing so I guess some modifications are needed. Thanks!

ADD REPLY • link 23 months ago by Rezenman • 0

score 2 · Accepted Answer · 2022-06-07

2

Entering edit mode

23 months ago

Mensur Dlakic ★ 27k

There are many tools for this purpose.

https://github.com/hzi-bifo/RiboDetector (needs no database)
http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html (mirabait, comes with its own database)
https://bioinfo.lifl.fr/RNA/sortmerna/
https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/35881-introducing-bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries?t=41288 (BBsplit)
https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ (BBduk)

ADD COMMENT • link 23 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Great, I'll definitely take a look at that, do you have any favorite one?

ADD REPLY • link 23 months ago by Rezenman • 0

0

Entering edit mode

I generally use MIRA for this purpose, but RiboDetector features what seems to be the most universal approach.

ADD REPLY • link 23 months ago by Mensur Dlakic ★ 27k