Hi there!
I've been working with 16S DNA metagenomics sequences recently. Before proceeding with DADA2 and Qiime2-amplicon-2023.9, I performed quality filtering using the following code:
for file in "${INPUT_DIR}"/*.fastq; do
base_name=$(basename "${file}" .fastq)
bbduk.sh in="${file}" out="${OUTPUT_DIR}/${base_name}.fastq" qtrim=rl trimq=10
echo "Processed file: ${file}"
done
Afterwards, I conducted a FastQC analysis, and the results looked promising. Initially, I encountered an issue where setting the trimq parameter to 15 or 20 resulted in forward and reverse sequences being of different lengths, making it impossible to proceed with DADA2. To resolve this, I set trimq to 10, and adjusted the parameters in DADA2 accordingly:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs ${INPUT_DIR}/demux-paired-end.qza \
--p-trunc-len-f 250 \
--p-trim-left-f 8 \
--p-trunc-len-r 230 \
--p-trim-left-r 8 \
--o-representative-sequences ${OUTPUT_DIR}/dada2-rep-seqs.qza \
--o-table ${OUTPUT_DIR}/dada2-table.qza \
--o-denoising-stats ${OUTPUT_DIR}/dada2-stats.qza
However, despite these adjustments, I encountered a significant decrease in the percentage of merged reads.
Do you have any suggestions on how to improve this situation?
You can check with
bbmerge.sh
from BBMap suite to see how many reads merge. If your data is non-overlapping then there is not much you can do