I am working on some fecal data from birds(only forward reads 16s Analysis) and was going through the Qiime2 pipeline. After the denoising step using Deblur, there's an output file deblur-stats.qza which has all the information about the unique sequences found, removed and such. I'm trying to make sense of this table but there's very little information on it except for when you put the cursor on the headers of the table. I've also gone through the deblur paper but I still have a few questions so following are the questions:
qiime deblur denoise-16S --i-demultiplexed-seqs 2_demux-filtered.qza
--p-trim-length 100 --o-representative-sequences 3_rep-seqs-deblur_100.qza
--o-table 3_table-deblur_100.qza --p-sample-stats --o-stats 3_deblur-stats_100.qza
1) According to the deblur paper, it says that there are two methods for filtering (postiive(against reference genome) or negative( against PhiX)) but it looks like both of these filtering steps is performed(there are output headers for both reads-hit-artifact and reads-hit-reference in Qiime2 and i'm confused about the sequence of these steps taken because it doesn't follow the pipeline as mentioned in the paper.
2) According to this deblur-stats.qza output file, the reads-hit-reference header is the total number of reads preserved. So does that mean that this positive filtering is done after the reads go through deblur algorithm and the chimera removal? If so, the header 'unique-reads-hit-reference' header sums up to 56,015 while the number of features is close to 9993. While I understand that the unique sequences repeated in different samples were merged and there was additional filtering using 'min-size' flag of value 10. Is that the reason for such a large decrease in the number of features?