I am analyzing a ChIP-seq dataset. In order to increase read depth and obtain enough uniquely mapped reads, I decided to re-sequenced my sample and merge that dataset with the data I already have. However, I am concerned that macs2 will call more duplicates and omit them during the peak calling step. Is there anyway to avoid this? Any help will be greatly appreciated.
What is it you think should happen with the duplicates? Why shouldn't it throw them out, given that what you're looking for (more unique reads, more read depth) will be increasing regardless?
Duplicates are due to PCR overamplification whereas uniquely mapped reads are reads that align to only one place in the genome as opposed to multiple. My concern is that since I re-sequenced my sample, there are going to be more reads that have the same beginning and end coordinates, causing the program to think it's a PCR duplicate and therefore will omit those reads from being called in peaks.