My understanding is that PCR duplicate detection relies on detecting pairs of reads that share identical 5'-end coordinates and orientations. Does quality trimming of the 5' ends of reads prior to alignment interfere with this (post-alignment) duplicate detection? For example, if we have 2 read pairs (call them A and B) resulting from PCR duplication, and read 1 of pair A gets a few bases trimmed off the 5'-end because they are low quality, but in pair B the same bases are higher quality and do not get trimmed, then the 5'-end of pair A read 1 would have a different coordinate than the 5'-end of pair B read 1. Wouldn't this defeat the duplicate detection algorithm? Or is there something I'm missing here?
Hmm... I was thinking more of post-alignment duplicate marking, like with Picardtools. But doing it on the raw reads is an interesting option. More than one way to skin a cat I guess.
It would be best to do this on original reads since aligners will soft-clip bases and you may lose important information in reported alignments. You can also find other kinds of duplicates (e.g. optical) with original reads.