Question

RDP classifier pipeline question (16s analysis)

0

Entering edit mode

7.2 years ago

manekineko ▴ 150

Hi, I have question related to 16s metagenomics analysis. I have a pipeline including:

USEARCH (dereplicate, cluster and map to OTU) -> RDP classifier

I have a sample with 200k joined pair-end seq from illumina that I know at least 90% are valid bacterial 16s.

But after dereplicate and cluster I got only ~6k sequences which are classified with RDP at the end file from which I can calculate bacterial percentage for the taxonomy. But these 6k sequences are I think the non-redundant sequences and their copy number are not taking into account?

If I run just RDP classifier directly with the sample FASTA file I got all sequences classified which sound better..... So I wonder what is doing this USEARCH and isnit nessesary step?

16s • 2.7k views

ADD COMMENT • link updated 6.2 years ago by gb ★ 2.2k • written 7.2 years ago by manekineko ▴ 150

score 0 · Answer 1 · 2018-02-07

0

Entering edit mode

6.2 years ago

gb ★ 2.2k

I think you are asking your question wrong or you dont look good enough to your own data.

You say that you start with 200k merged reads and after dereplication and clustering you can only identify 6k read.

My question to you, what do you think dereplication and clustering does?

ADD COMMENT • link 6.2 years ago by gb ★ 2.2k