Question: a lot of OTUs wth no reference
0
gravatar for agata88
11 months ago by
agata88730
Poland
agata88730 wrote:

Hi all!

I have more that 2000 OTUs detected, for which only 25% have assigned taxonomy. The quality of reads is good. I have suspicious that it is a soil sample but I am not sure about it. I know it's Miseq, V3-V4.

After blast search of unassigned OTUs the results shows : "uncultured bacteria".

My questions are: What could be the reason of detection of so many OTUs? How interpret OTUs with no taxonomy?

Many thanks for any suggestion,

Best,

Agata

16s • 546 views
ADD COMMENTlink written 11 months ago by agata88730

I say this as someone who has no idea what OTUs are or how they are detected - your question needs some context. Is there a specific method/tool you used to detect OTUs? You say your data is from Miseq, what did you use to process this data? What organism is this data from?

Maybe my lack of knowledge on this is generating more questions than would be necessary for an expert on OTUs, but context always helps.

ADD REPLYlink written 11 months ago by Ram17k

What tool you used and which database you compared to? Was the taxonomy unassigned at all or at the species/genus/family etc. level?

ADD REPLYlink written 11 months ago by Asaf4.9k

I have my own taxonomy...but the same results are for Greengenes 97_otu.fasta. Taxonomy was unassigned at all.

ADD REPLYlink written 11 months ago by agata88730

At what percentage identity are you clustering your OTUs? Have you filtered out singletons and doubletons? What are you blasting your OTU representative sequences against? Why aren't you using a more sophisticated tool like e.g. RDP classifier?

ADD REPLYlink written 11 months ago by 5heikki7.6k

97%, I've filtered singletons and doubletons. I am blasting against my own reference but the same results are for Geengenes (97_otu.fasta).

I need to say that I am sure that this is not the software problem - because it was validated many times and worked very well .

I am more curious about the biological problem ...could primers don't work well? Could there be new species not adnotated in any reference?

ADD REPLYlink modified 11 months ago • written 11 months ago by agata88730
1

Classify your OTU reps against Silva or RDP. Also, for non-complete 16S you should consider 99 or even 100% identity for OTU clustering, see e.g. https://www.biorxiv.org/content/early/2017/09/21/192211

ADD REPLYlink modified 11 months ago • written 11 months ago by 5heikki7.6k

99-100% is required only for species level. It doesn't mean he can't get valuable information with less than that.

ADD REPLYlink modified 11 months ago • written 11 months ago by Asaf4.9k
1

I'm talking about OTU clustering, not taxonomy annotation threshold

ADD REPLYlink written 11 months ago by 5heikki7.6k

Yes, I think this is too much strict...if you have a good quality reads, and high number of reads you can assign taxonomy to species level even with 97% OTU clustering.

ADD REPLYlink written 11 months ago by agata88730
1

Thank you all for help. After considering your suggestions I've decided to push up to 100% OTU clustering for species discovery. I know that I will lost a lot of biological meaningful information but since I am curious only for species - I think this is the best option :)

PS. I've checked 100% OTU clustering treshold for Mock community public sample and it works better that 97% OTU clustering.

ADD REPLYlink written 11 months ago by agata88730

If you're looking for a relative with 97% identity you're likely to not find any match, it's too strict for soil samples. You should use other methods which take a broader look and they will be able to classify it to higher taxonomic levels (Genus, Family etc.), mothur for instance. You can try to run some OTUs in SILVA classifier as well.

ADD REPLYlink written 11 months ago by Asaf4.9k

Why 97% is to strict for soil samples?

ADD REPLYlink written 11 months ago by agata88730
1

Because the vast majority of the jungle in there had never been isolated, this is terranova

ADD REPLYlink written 11 months ago by Asaf4.9k
1

I don't think 97% match is too strict for soil samples. OP is getting hits, it's just that they're to "uncultured bacteria". With RDP and SILVA at least, each included 16S sequence includes lineage information, e.g. "Bacteria: Proteobacteria: Gammaproteobacteria: Enterobacteriales: Enterobacteriaceae: Escherichia; uncultured bacteria". So all OP has to do is to look at information at genus or family level..

ADD REPLYlink written 11 months ago by 5heikki7.6k

This kind of results only mean that someone sequenced this 16S, predicted its lineage and deposited it (or SILVA predicted the lineage of a deposited 16S sequence). Predicting lineage with matches less than 97% is basically the same.

ADD REPLYlink written 11 months ago by Asaf4.9k

This is the information I was looking for. Thanks.

ADD REPLYlink written 11 months ago by agata88730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1132 users visited in the last hour