I have "a quick question" about Picard MarkDuplicates. I have ATAC-seq data, already filtered for mitochondrial and unmapped reads. The initial file has about 59 million reads. When I run MarkDuplicates as so: java -jar /exports/igmm/eddie/hill-lab/Zoe/References_and_Scripts/picard-tools-2.5.0/picard.jar MarkDuplicates I=Mutant1_paired_align_subMitoUnc_sorted.bam O=Mutant1_align_filtered.bam M=Mutant1_test_metrics.txt REMOVE_DUPLICATES=true The file then has 39 million reads
However if I run it with REMOVE_DUPLICATES=FALSE and then use samtools to remove the 1024 flagged reads I end up with 56 million reads. I really can't seem to understand why using the remove_duplicates=TRUE causes such a difference? Should the output of both methods not be similar? Thanks in advance!
All the best, Zoe