I have tried samtools rmdup on my paired end fastq files, which were earlier trimmed. According to the samtools manual, rmdup works as follows: Remove potential PCR duplicates: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality.
I have 23% duplicates in my data (found by aligning raw reads to the reference). Trimming the raw reads would have trimmed duplicates into reads of different lengths. How then would rmdup work on my pre-processed reads?
Is there a better option?