I am using samtools markdup -r parameter to remove PCR duplicates from my mapped ChIP reads. How does it detects the PCR duplicated reads?
Please let me know if I am understanding it correctly. This is what I understood:
I need to name sort the mapped reads and put MS-MC tags (using fixmate) and then I need to coordinate sort the reads before using the "markdup -r" command. If the chromosome number and 5' end of the read are the same, it will treat them as duplicate and by looking at the MC and MS scores it will choose the best-read among the duplicates and delete the rest of the reads (I am using "-r" parameter).. Is this what is happening here?
Any kind of help will be highly appreciated.