Filtering out broad peaks
1
1
Entering edit mode
5.4 years ago
Bioradical ▴ 60

Is there a way to filter out broad peaks / overlaps from a narrowpeak file? Example: I am looking at Pol II and have very nice sharp peaks at the TSS of genes, and these are the kind of peaks i'm interested in identifying inside exons / introns. I'm looking for overlaps of a TF at Exons but I am only interested in narrow, sharp, clean peaks and not large islands with multiple overlaps of peaks.

I am mainly using bedtools as I have no programming experience, but I don't mind learning another tool if needed.

filter ChIP-Seq • 1.6k views
ADD COMMENT
2
Entering edit mode

Not sure how "broad" these peaks are, given its a narrowPeak file, but can't you use a size based filter? Particularly, if you are not interested in regions with overlapping peaks, you can first merge them using bedtools merge, and then filter out large/merged peaks using awk as follows:

Size based method:

Assuming you want to merge all peaks which are less than 50bp apart, and remove all peaks that are larger than 300bp:

$ bedtools merge -d 50 -i [input file] > merged.bed
$ awk '{if ($3-$2 <= 300) print $0;}' merged.bed > filtered.bed

Overlap based method:

Assuming you want to merge only overlapping peaks, and remove all merged features where more than two peaks are merged:

$ bedtools merge -c 1 -o count -i [input file] > merged.bed
$ awk '{if ($4 <=2) print $0;}' merged.bed > filtered.bed

Note that bedtools merge removes all except the first three columns in the output unless explicitly retained column-wise.

ADD REPLY
0
Entering edit mode

Tej Sowpati you should add this as an answer (not as a reply). Seems logical to me.

ADD REPLY
2
Entering edit mode
5.3 years ago
Tej Sowpati ▴ 250

Adding my comment as an answer.

Not sure how "broad" these peaks are, given its a narrowPeak file, but can't you use a size based filter? Particularly, if you are not interested in regions with overlapping peaks, you can first merge them using bedtools merge, and then filter out large/merged peaks using awk as follows:

Size based method:

Assuming you want to merge all peaks which are less than 50bp apart, and remove all peaks that are larger than 300bp:

$ bedtools merge -d 50 -i [input file] > merged.bed
$ awk '{if ($3-$2 <= 300) print $0;}' merged.bed > filtered.bed

Overlap based method:

Assuming you want to merge only overlapping peaks, and remove all merged features where more than two peaks are merged:

$ bedtools merge -c 1 -o count -i [input file] > merged.bed
$ awk '{if ($4 <=2) print $0;}' merged.bed > filtered.bed

Note that bedtools merge removes all except the first three columns in the output unless explicitly retained column-wise.

ADD COMMENT

Login before adding your answer.

Traffic: 1577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6