I have been running dada2 pipeline to a set of sample. I passed through all phases as follow: filtration, run error model, sample inference, merging, constructing ASV. My question is regarding the stage "Sample inference": My understanding is that this stage generate number of unique sequences out of the filtered ones, which I can get easily from the printed text of dadaFs and dadaRs: my code here was:
# forward reads
dadaFs <- dada(filtFs, err=errF, multithread=TRUE)
#reverese reads
dadaRs <- dada(filtRs, err=errR, multithread=TRUE)
Example of output:
dada-class: object describing DADA2 denoising results
486 sequence variants were inferred from 63948 input unique sequences.
Key parameters: OMEGA_A = 1e-40, OMEGA_C = 1e-40, BAND_SIZE = 16
Which is for me means there is 63948 unique sequences from this sample that have 171532 (filtered reads) What makes my confusion is that when I apply the following function:
# track the the read count after all steps
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers_5, getN), rowSums(seqtab.nochim))
# If processing a single sample, remove the sapply calls: e.g. replace sapply(dadaFs, getN) with getN(dadaFs)
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")
I obtained this table: Could you help me understanidng the meansing of columns "denoised R or F" ? The numbers writen is not the unique sequences nor it is the filtered reads - the unique sequences ... what this refere to ? Also, in the "merged" column, I did see a different number from the number of the merged sequences which I got using this code:
mergers_5<- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE, minOverlap = 10, maxMismatch = 1)