Question

Are there any tools that automatically detect where to truncate forward and reverse reads for 16S (DADA2)?

0

Entering edit mode

20 months ago

O.rka ▴ 710

I'm about to run DADA2 via QIIME2 and noticing that it's very hands-on and not completely automated.

Are there any reliable tools that can indicate where I should trim my forward and reverse reads based on input fastq?

Here's the documentation for the module I am running: https://docs.qiime2.org/2022.2/plugins/available/dada2/denoise-paired/

Is it not advised to merge all the reads, run through something like FastQC, and then find the region where the 25th percentile is lower than 30?

16s dada2 illumina metagenomics qiime2 • 1.6k views

ADD COMMENT • link 20 months ago by O.rka ▴ 710

score 0 · Answer 1 · 2022-08-09

Hi,

DADA2 implemented via wrapper python scripts in QIIME2 or directly in R can be as automatic as you want. You can write a bash script, snakemake or nextflow workflow (among many others) stating the order and dependencies of every step to run. In this way from a given input you'll have a desirable effortless output with your final results.

An automatic pipeline as described above works relatively well for very standardized analyses and data where you expect that data and results will behave within the standards. Of course, that even in such pipeline you will run QC steps that will produce QC plots which you should check to be certain that the data actually follows your expectations regarding its quality.

Are there any reliable tools that can indicate where I should trim my forward and reverse reads based on input fastq?

This depends on your criteria. There are tools that produce QC plots that help you decide about which values to choose and define as thresholds to trim and truncate your reads such as FastQC and MultiQC (which depends on the results produced by FastQC). The colors in these plots give you an indication of quality. There are many guides online explaining how to interpret these plots.

I would say that quite often people use recommended values which are not "bullet proof" but they work for standard data quality. You can attempt to use the values provided by the tutorial/qiime2 wrapper if you have the same type of data (which I think the values are intended for Illumina data - but I'm not sure) and hyper-variable region of 16S (if you are using a different hyper-variable region or read-length the values may not be adequate). In general, if you are not very familiar with the data that you're working on neither the analyses, I wouldn't recommend this approach as you may run things that you even don't know what they meaning and their implications. In that case, automatic and simple wrappers like the one you pointed out might be dangerous as it "hides" many sequential steps which makes difficult to understand their order and functionality.

I think the plugin that you pointed out implements in general the DADA2 workflow in R (DADA2 is an R package). Therefore, I would recommend that you check the DADA2 tutorial which may help to understand better the workflow, the order of steps, their implication and options implemented in the QIIME2 wrapper: https://benjjneb.github.io/dada2/tutorial.html

Is it not advised to merge all the reads, run through something like FastQC, and then find the region where the 25th percentile is lower than 30?

There are different possible approaches. Usually I would trim and truncate the forward and reverse reads of 16S rRNA sequences indenpendently before merging them. The DADA2 tutorial also follows this approach. Then you denoise your sequences (after learning the error rates), and only then you merge them.

I hope this helps,

António

score 0 · Answer 2 · 2022-08-09

0

Entering edit mode

20 months ago

colindaven 6.4k

I had very good results for 16S analysis using the nf-core ampliseq pipeline at https://github.com/nf-core/ampliseq

There were only a few problems configuring the input, but using an input CSV file solved this easily enough. DADA2 is part of that pipeline.

ADD COMMENT • link 20 months ago by colindaven 6.4k