merge chipseq peaks with bedtools/other tool
2
1
Entering edit mode
8.9 years ago
s.vdrzeeuw ▴ 20

Hi all,

While working on a ChIP-Seq data set consisting out of 16 samples I want to see the differences in peak height. To achieve this I first need a merged peak location. To achieve this I was thinking of a tool which could merge all 16 of my peak files at once. E.G. bedtools merge / multiinter. Only thing is that I have the feeling this is not exactly what I want and it becomes difficult to see if bedtools does a good job here.

I want to achieve a peak location in the following way:

A: start = 25 : end = 50
B: start = 30 : end = 65
C: start = 20 : end = 45
MERGED: start = 20 : end = 65

Which tool/ mode from bedtools can achieve this result. Any hints are very much appreciated. Thanks!

Sander

ChIP-Seq bedtools macs2 • 9.2k views
ADD COMMENT
0
Entering edit mode

Hey,

Are you talking about getting a reference location for comparisons? because peak merging is a different concept, in which you would assemble all the locations and generate a wide peak (or depending on the operations you use e.g., bedtools merge).

Your question is not very clear to me.

ADD REPLY
0
Entering edit mode

Uuh I am talking about the latter one you describe. So I have 16 peak files where I basically want to know which peaks overlap. And if they overlap I want the minimum start and the maximum end (binning approach). So that I can build a gtf file for counting all my reads laying in that region per sample. After generating the count table per sample I can apply something like edgeR to find differences in my bins. Basically I want to do differential peak binding/calling.

ADD REPLY
6
Entering edit mode
8.9 years ago

This should do it, concatenate peak locations in all peaks, sort them and merge

cat A B C .... | sort -k1,1 -k2,2n | mergeBed -i stdin > locations.bed

To know which files the peaks co-ordinates are merged from, you need to have an identifier in each file before merging.

Use

awk '{print $0"\t","peakFile-"NR}' A > A_id

This will add a new last column with label "peakFile-1" incremented per row, which will be nice, if you want to track later, which exact and how many peaks were used from which file for the current peak merge. I leave it you to implement a loop to label all the files automatically. Once its done, use the collapseoperator from mergeBed.

cat A_id B_id C_id .... | sort -k1,1 -k2,2n | mergeBed -i stdin - o collapse -c 4

where c is the column number having the id's we just entered before.

Output:

chr1    20    65     peakFile-3, peakFile-1, peakFile-2

Enjoy!

ADD COMMENT
0
Entering edit mode

Hmm, indeed it is doing something. I get a list of peak start and stop locations. But there is no way to see from which files these values come. So do you have any suggestions on this? And how to see bedtools merge does a good job here? Thanks for helping me out on this!

ADD REPLY
0
Entering edit mode

I updated my answer, for the validation you check them manually to start with, if you want.

ADD REPLY
0
Entering edit mode

Thanks Sukhdeep,

Really appreciate your nice way of explaining ;).

ADD REPLY
0
Entering edit mode

Thanks, good luck!

ADD REPLY
0
Entering edit mode
2.6 years ago
Huanhuan • 0

Hi, thank you for the question and answers. I still have one doubt: Is the input narrowPeak file called by macs2? My command for macs2 is:

macs2 callpeak -t sample.rmdup.bam --outdir $peaks -n sample

or the input should be the result of intersection:

bedtools intersect  -abam sample.rmdup.bam -b sample_peaks.narrowPeak  > sample_intersection.bed

Thank you!

ADD COMMENT
0
Entering edit mode

May I ask the meaning of intersection of bam file and peak file? Never seen this before.

BTW this question is quite old, perhaps it's better for you to open a new one and link this one :D

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6