Hey all,
I have metagenomics data from several samples, from which I want to fish out only 16S reads. I thought about using kraken2 to match reads to the silva database and only keep classified reads considering them as 16S reads. The only problem I have is that I am not sure at what threshold to set the confidence score. I would love to get some advice from the experienced community in here. My reads are 151bp paired end
if anyone has another useful tool for acheiving the same goal I would love to hear
Thanks a lot ! Shahar
I'm not sure kraken is the right tool for that. Why not use any other aligner like
bwa
against Silva database?Hey, we have tried to do it using other alignment tools but it seems to be also problematic when setting the identity threshold, So we wanted to try out some other tools that are more geared towards microbiome analysis and it seems that Kraken is widely used in the microbiome community so I thought it might be relevant. Thanks for the suggestion!
What primers did you use? Why do you consider it necessary to filter them out?. Besides, what about unclassified/unknown 16S reads? it would cause biased normalization. I am not an expert in the subject, I just want to understand the rationale behind your analysis.
This is data originated from metagenomics so no primers, just whole population sequencing. We want to filter all 16S reads for downstream analysis, and yes you are right about the unclassified reads, that's why I wanted suggestions regarding the confidence score thinking of keeping it relatively low to allow "weak" matches also. Thanks for the response !
As per the tutorial here on ONT reads https://usegalaxy.org/training-material/topics/metagenomics/tutorials/nanopore-16S-metagenomics/tutorial.html, confident score 0.1 is used in classification, for silva db.
Great, that might be a good start. However, in their case, they are working with full 16S sequences and we have short reads from whole population sequencing so I guess some modifications are needed. Thanks!