Question

How do I select top 100,000 non-overlapping peaks from MACS2 narrow peaks output?

0

Entering edit mode

8.3 years ago

BioinfGuru ★ 2.1k

Hello everyone,

I've just finished MACS2 narrow peak calling ATAC-seq data. With a cut-off q-value of 0.05 I have around 200K peaks per sample. My literature review suggests only the top 50,000 (some say 100,000) non-overlapping peaks are included in downstream analysis.

From the authors of the ATAC-seq protocol:

"Using the filtered peak set, peak summits were extended +/-250 bps. The top 50,000 non-overlapping 500bp summits, which we refer to as accessibility peaks were used for all downstream analysis."

Conceptually I get the reasoning, there is no need to have 1000s of peaks fall in the same 500bp window so remove the overlaps.

However, no authors state how they rank the top 100,000. Is it by -log10(qvalue) or is it by number of reads within the 500bp window? Does it make a difference which one I use?

It would be easier to use -log10(qvalue) as it is right there in the same narrowPeaks file with positions. I do realize I can be more strict with the q-value but I think that will not be enough to cut down to 100,000 peaks.

Thanks for your input

Kenneth

ATAC MACS2 overlapping narrow peaks • 3.4k views

ADD COMMENT • link updated 8.2 years ago by Biostar 20 • written 8.3 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

maybe you should merge many overlapping peak into one large peak. or you can ask the author of MACS2.

ADD REPLY • link 8.3 years ago by Ben ▴ 60

0

Entering edit mode

Yes Ben I will be doing that.... however a ranking is still required to decide which peak to choose to keep.

ADD REPLY • link 8.3 years ago by BioinfGuru ★ 2.1k