How to extract peak coordinates for chip-seq bedgraph file ?
1
2
Entering edit mode
7.4 years ago
jack ▴ 950

Hi all,

I have bedgraph file from chip-seq experiment and I want to extract the coordinates of the peaks. Here is how the file looks like.

Would someone help me with that?

Here's what they say in the GEO

Bedgraph files were generated from peak files built from reads mapping uniquely to the genome with at most 0,1,2 mismatches, using Pyicos (Althammer et al., 2011). Significant ChIP-Seq signal was calculated from the coverage of reads comparing each samples with its specific control sample.

Supplementary_files_format_and_content: Bedgraph files were generated from peak files built from reads mapping uniquely to the genome with at most 0,1,2 mismatches, using Pyicos (Althammer et al., 2011). Significant ChIP-Seq signal was calculated from the coverage of reads comparing each samples with its specific control sample. The significant peaks per sample in bedgraph format are the following: MCF10-HP1-CTRL.MA.pvalue.bedgraph, MCF7HP1-CTRL.MA.pvalue.bedgraph, 4B8-MCF10-CTRL.MA.pvalue.bedgraph, 4B8-MCF7-CTRL.MA.pvalue.bedgraph, MCF10-5metC-CTRL.MA.pvalue.bedgraph, MCF7-5metC-CTRL.MA.pvalue.bedgraph, MCF10-RNAPII-CTRL.MA.pvalue.bedgraph). AGO1 MCF7 Millipore significant peaks were calculated using Pyicos (Althammer et al., 2011).

track type=wiggle_0    name="noname"    visibility=full
chr1    140095    140127    1.00
chr1    140127    140385    2.00
chr1    140385    140445    3.00
chr1    140445    140477    2.00
chr1    140477    140735    1.00
chr1    226485    226800    1.00
chr1    226800    226821    2.00
chr1    226821    226835    3.00
chr1    226835    227150    2.00
chr1    227150    227171    1.00
chr1    533511    533687    1.00
chr1    533687    533861    2.00
chr1    533861    534037    1.00
chr1    698360    698489    1.00
chr1    698489    698710    2.00
chr1    698710    698839    1.00
chr1    747229    747458    1.00
chr1    747458    747579    2.00
chr1    747579    747808    1.00
chr1    748296    748322    1.00
chr1    748322    748359    2.00
chr1    748359    748646    3.00
chr1    748646    748672    2.00
chr1    748672    748709    1.00
chr1    757348    757423    1.00
chr1    757423    757698    2.00
chr1    757698    757773    1.00
chr1    759148    759465    1.00
chr1    759465    759498    2.00
chr1    759498    759815    1.00
chr1    768858    769083    1.00

ChIP-Seq RNA-Seq • 4.2k views
1
Entering edit mode

Usually peak callers will give you bed files with the peaks don't you have that information? In the bedgraph file that you provide there are groups of continuous regions. Are those your peak regions?

0
Entering edit mode

I don't have the information regarding the peaks. I just have this bedgraph file which they say peaks are there !

2
Entering edit mode

Then is too much guess work. You can try to get a list of continuous regions from the bedgraph and assume that those are the peaks or, you just ask for the BED file containing the peaks because this is clearly the wrong file.

0
Entering edit mode

I agree, but that's what they have put in GEO and they say that it's our peak file.

0
Entering edit mode

Did you try contacting the authors?

0
Entering edit mode

Yes, but they are not responding

1
Entering edit mode
7.4 years ago
Ian 5.9k

This is a very odd way to report peak information, but not your fault. As the coordinates are in blocks we assume they are the binding regions, and as they appear to be continuous, e.g. 142107 is the end and start of two regions, you could use:

bedtools merge -d 0 -i in.bdg > merged.bdg


-d means that only regions that are 'book-ended', i.e. next to each other or overlapping will be merged. The fourth column will be removed.

This should work, but I would hate to rely on data that I do not fully understand.

0
Entering edit mode

thanks, seems good, I will update with the result