I have a doubt in samtools flagstat calculation, alignment quality check was done for the same samples but the alignment was done in different ways. Downloaded public available sample it was 80Gb fastq with Lane 1 and lane 2 which forward and reverse reads (paired end).
merged reads lane 1 & lane 2 were taken for the bwa mem alignment with grch38, and various step dupemerge step was done and this output was taken for an alignment quality check by samtools flagstat.
The second way, each raw fastq reads of both lane were taken and each read was aligned to grch38, after dupmerge step the reads were merged into one whole.bam file using samtools merge and the quality check done for the merged file.
But I observed the number of duplicates is decreased in the second way generated .bam file.
Reads are merged initial and used for further processing:
$samtools flagstat sample1_dup_merged.bam 603909814 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 1723882 + 0 supplementary **7197218 + 0 duplicates** 601809982 + 0 mapped **(99.65% : N/A)** 602185932 + 0 paired in sequencing 301084296 + 0 read1 301101636 + 0 read2 590992186 + 0 properly paired (98.14% : N/A) 597986268 + 0 with itself and mate mapped 2099832 + 0 singletons (0.35% : N/A) 3880858 + 0 with mate mapped to a different chr 2373672 + 0 with mate mapped to a different chr (mapQ>=5)
Each read of L1(40 reads) & L2 (39 reads) was as input for each processes:
$samtools flagstat sample1_Merged.bam 603910224 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 1723659 + 0 supplementary **208749 + 0 duplicates** 601810438 + 0 mapped **(99.65% : N/A)** 602186565 + 0 paired in sequencing 301084777 + 0 read1 301101788 + 0 read2 590992150 + 0 properly paired (98.14% : N/A) 597986993 + 0 with itself and mate mapped 2099786 + 0 singletons (0.35% : N/A) 3881347 + 0 with mate mapped to a different chr 2373432 + 0 with mate mapped to a different chr (mapQ>=5)
Could anyone tell me how this samtools calculate duplicates and in this output why the number of the duplicate is different but the quality remains the same in both ways?