I have some paired-end sequencing data that has a significant number of pairs that overlap due to small insert sizes. In my experience, merging the read pairs (and recalibrating with bbmap) results in better alignments. However, when it comes to the PCR duplicate removal step of the merged read pairs, I want to identify those alignments where both the 5' and 3' ends are identical, and none of the commonly used tools (samtools, sambamba, picard) appears to have this feature.
My question is, how would you filter for these overlapping read pairs that have been merged prior to alignment? I don't want to treat them as single-end reads for PCR duplicate removal as I may end up discarding information unnecessarily.