Question

Confused about the results of a function in dada2 pipeline sample inference step

0

Entering edit mode

19 days ago

Mohamed Samir ▴ 20

I have been running dada2 pipeline to a set of sample. I passed through all phases as follow: filtration, run error model, sample inference, merging, constructing ASV. My question is regarding the stage "Sample inference": My understanding is that this stage generate number of unique sequences out of the filtered ones, which I can get easily from the printed text of dadaFs and dadaRs: my code here was:

 # forward reads 
dadaFs <- dada(filtFs, err=errF, multithread=TRUE)
#reverese reads 
dadaRs <- dada(filtRs, err=errR, multithread=TRUE)

Example of output:

dada-class: object describing DADA2 denoising results
486 sequence variants were inferred from 63948 input unique sequences.
Key parameters: OMEGA_A = 1e-40, OMEGA_C = 1e-40, BAND_SIZE = 16

Which is for me means there is 63948 unique sequences from this sample that have 171532 (filtered reads) What makes my confusion is that when I apply the following function:

# track the the read count after all steps 
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers_5, getN), rowSums(seqtab.nochim))
# If processing a single sample, remove the sapply calls: e.g. replace sapply(dadaFs, getN) with getN(dadaFs)
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")

I obtained this table: Table Could you help me understanidng the meansing of columns "denoised R or F" ? The numbers writen is not the unique sequences nor it is the filtered reads - the unique sequences ... what this refere to ? Also, in the "merged" column, I did see a different number from the number of the merged sequences which I got using this code:

mergers_5<- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE, minOverlap = 10, maxMismatch = 1)

Microbiome R dada2 • 106 views

ADD COMMENT • link 19 days ago by Mohamed Samir ▴ 20