Question

Filtering out broad peaks

1

Entering edit mode

8.4 years ago

Bioradical ▴ 60

Is there a way to filter out broad peaks / overlaps from a narrowpeak file? Example: I am looking at Pol II and have very nice sharp peaks at the TSS of genes, and these are the kind of peaks i'm interested in identifying inside exons / introns. I'm looking for overlaps of a TF at Exons but I am only interested in narrow, sharp, clean peaks and not large islands with multiple overlaps of peaks.

I am mainly using bedtools as I have no programming experience, but I don't mind learning another tool if needed.

filter ChIP-Seq • 2.4k views

ADD COMMENT • link updated 8.4 years ago by Tej Sowpati ▴ 250 • written 8.4 years ago by Bioradical ▴ 60

2

Entering edit mode

Not sure how "broad" these peaks are, given its a narrowPeak file, but can't you use a size based filter? Particularly, if you are not interested in regions with overlapping peaks, you can first merge them using bedtools merge, and then filter out large/merged peaks using awk as follows:

Size based method:

Assuming you want to merge all peaks which are less than 50bp apart, and remove all peaks that are larger than 300bp:

$ bedtools merge -d 50 -i [input file] > merged.bed
$ awk '{if ($3-$2 <= 300) print $0;}' merged.bed > filtered.bed

Overlap based method:

Assuming you want to merge only overlapping peaks, and remove all merged features where more than two peaks are merged:

$ bedtools merge -c 1 -o count -i [input file] > merged.bed
$ awk '{if ($4 <=2) print $0;}' merged.bed > filtered.bed

Note that bedtools merge removes all except the first three columns in the output unless explicitly retained column-wise.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Tej Sowpati ▴ 250

0

Entering edit mode

Tej Sowpati you should add this as an answer (not as a reply). Seems logical to me.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Chris Fields ★ 2.2k

Ram · Accepted Answer · 2015-12-17

Adding my comment as an answer.

Not sure how "broad" these peaks are, given its a narrowPeak file, but can't you use a size based filter? Particularly, if you are not interested in regions with overlapping peaks, you can first merge them using bedtools merge, and then filter out large/merged peaks using awk as follows:

Size based method:

Assuming you want to merge all peaks which are less than 50bp apart, and remove all peaks that are larger than 300bp:

$ bedtools merge -d 50 -i [input file] > merged.bed
$ awk '{if ($3-$2 <= 300) print $0;}' merged.bed > filtered.bed

Overlap based method:

Assuming you want to merge only overlapping peaks, and remove all merged features where more than two peaks are merged:

$ bedtools merge -c 1 -o count -i [input file] > merged.bed
$ awk '{if ($4 <=2) print $0;}' merged.bed > filtered.bed

Note that bedtools merge removes all except the first three columns in the output unless explicitly retained column-wise.