Question: merge chipseq peaks with bedtools/other tool
0
gravatar for s.vdrzeeuw
2.5 years ago by
s.vdrzeeuw0
Netherlands
s.vdrzeeuw0 wrote:

HI all,

While working on a ChIP-Seq data set consisting out of 16 samples i want to see the differences in peak height. To achieve this i first need a merged peak location. To achieve this i was thinking of a tool which could merge all 16 of my peak files at once. E.G. bedtools merge / multiinter. Only thing is that i have the feeling this is not exaclty what i want and it becomes difficult to see if bedtools does a good job here.

I want to achieve a peak location in the following way:

A: start = 25 : end = 50
B: start = 30 : end = 65
C: start = 20 : end = 45
MERGED: start = 20 : end = 65.

Which tool/ mode from bedtools can achieve this result. Any hints are very much appreciated. Thanks!

Sander
 

 

 

chip-seq macs2 bedtools • 2.1k views
ADD COMMENTlink modified 2.5 years ago by Sukhdeep Singh9.1k • written 2.5 years ago by s.vdrzeeuw0

Hey,
Are you talking about getting a reference location for comparisons? because peak merging is a different concept, in which you would assemble all the locations and generate a wide peak (or depending on the operations you use eg: bedtools merge).
Your question is not very clear to me.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Sukhdeep Singh9.1k

Uuh i am talking about the latter one you describe. So i have 16 peak files where i basically want to know which peaks overlap. And if they overlap i want the minimum start and the maximum end (binning approach). So that i can build a gtf file for counting all my reads laying in  that region per sample. After generating the count table per sample i can apply something like edgeR to find differences in my bins. Basically i want to do differential peak binding/calling.

ADD REPLYlink written 2.5 years ago by s.vdrzeeuw0
4
gravatar for Sukhdeep Singh
2.5 years ago by
Sukhdeep Singh9.1k
Netherlands
Sukhdeep Singh9.1k wrote:
# this should do it, concatenate peak locations in all peaks, sort them and merge

cat A B C .... | sort -k1,1 -k2,2n | mergeBed -i stdin > locations.bed

To know which files the peaks co-ordinates are merged from, you need to have an identifier in each file before merging.

Use 
awk '{print $0"\t","peakFile-"NR}' A > A_id

This will add a new last column with label "peakFile-1" incremented per row, which will be nice, if you want to track later, which exact and how many peaks were used from which file for the current peak merge. I leave it you to implement a loop to label all the files automatically. Once its done, use the collapse operator from mergeBed.

cat A_id B_id C_id .... | sort -k1,1 -k2,2n | mergeBed -i stdin - o collapse -c 4

where c is the column number having the id's we just entered before.

output:
chr1    20    65     peakFile-3, peakFile-1, peakFile-2

Enjoy!!

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Sukhdeep Singh9.1k

Hmm, indeed it is doing something. I get a list of peak start and stop locations. But there is no way to see from which files these values come. So do you have any suggestions on this? And how to see bedtools merge does a good job here? Thanks for helping me out on this!

 

ADD REPLYlink written 2.5 years ago by s.vdrzeeuw0

I updated my answer, for the validation you check them manually to start with, if you want.

ADD REPLYlink written 2.5 years ago by Sukhdeep Singh9.1k

Thanks Sukhdeep,

Really appreciate your nice way of explaining ;).

ADD REPLYlink written 2.5 years ago by s.vdrzeeuw0

Thanks, good luck!

ADD REPLYlink written 2.5 years ago by Sukhdeep Singh9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 604 users visited in the last hour