Question

16S rRNA classification pipeline

0

Entering edit mode

5.3 years ago

Sus ▴ 40

Hello everyone,

I have 16S data and I'm trying to identify what genus/species etc. are in my samples and their relative abundance.

It's my first time working with 16S data and I'm also not used to classification, I'm trying to improve what I already did and understand some concepts.

As of today, to identify what's in my sample I've been relying on classifiers like Kraken or Centrifuge and the 16S databases like SILVA or GreenGenes. While they do give results that are fine on my test samples (I'm looking at genus level for now) my "pipeline" only consists on cleaning the files with tools like trimmomatic or Fastp before feeding them to the main classification tool.

I feel like this is very light and was wondering what I could do to improve it. I was thinking about doing some assembly beforehand (I'm currently trying ABYSS and SPADES). I also noticed that people are suggesting doing OTUs but I don't really understand why grouping them, especially knowing that some bacterial species have more than 98% ANI on their 16S.

What do I miss about OTUs, and what can I do to improve my classifcation ?

EDIT: I found this document to be really interesting as it covers a lot of things.

classifcation 16S rna-seq RNA-Seq • 1.8k views

ADD COMMENT • link updated 5.3 years ago by Kevin ▴ 640 • written 5.3 years ago by Sus ▴ 40

0

Entering edit mode

Which region do you've for 16S?
Also, there's no assembly needed for 16S data.
Do you've Illumina data?

ADD REPLY • link 5.3 years ago by Bioinformatics_NewComer ▴ 330

0

Entering edit mode

Right now I have illumina data targeting V3 & V4

ADD REPLY • link 5.3 years ago by Sus ▴ 40

score 1 · Answer 1 · 2019-01-08

1

Entering edit mode

5.3 years ago

leaodel ▴ 190

Hi Sus, I would try the BMP pipeline, they provide a detailed pipeline that you can use as it is or construct yours from theirs. You can find a detailed guide for rRNA 16/18 and ITS data analysis. Good luck!

ADD COMMENT • link 5.3 years ago by leaodel ▴ 190

score 0 · Answer 2 · 2019-01-10

one point of reference/ comparison is running the 16s workflow in https://ionreporter.thermofisher.com using their supplied demo data. Their panel uses more regions than your data, but the data visualisations should inspire you to do more.

i don't think de novo assembly will add new information but i might be wrong.

if you look at the results u might find that genus level differentiation is pretty good for 16s data. if u are looking to differentiate to species level that's a different story.