I've installed and run the current ENCODE-DCC ATAC-seq pipeline from GitHub. I have 2 replicates of ATAC-seq data. When I run the pipeline, regardless of which (blacklist-filtered) peak file I look at (rep1, rep2, or pooled), about 6% of the peak loci are repeated (but with different scores). Most are repeated two or three times, but the worst-case has 7 entries!
Here's an example:
chr1 629086 630068 Peak_1 1000 . 5.22734 7335.73975 7327.66357 727
chr1 629086 630068 Peak_53 1000 . 1.69177 307.40005 301.90186 79
chr1 629086 630068 Peak_6 1000 . 3.27104 2558.00244 2551.49731 291
Does anyone have any idea what's going on here? & what I should do with these "extra" peaks?
We never figured this out - we just merged the peaks in the end, we don't use the scores after an initial filter anyway.
open an issue at the github page of the ATAC-seq pipeline