Question: Obtain ChIPseq peaks from a bedgraph file with continuous regions
gravatar for mmaqueda
6 months ago by
mmaqueda0 wrote:

Hi all,

I have a bedGraph file from a ChIP-seq experiment (H3K27me3) downloaded from GEO (GSE84324). I want to obtain a list of genes with this mark enriched. However, the bedGraph shows a continuous signal with adjacent regions while I was expecting selected regions where I just needed to annotate genes.

I've read in the corresponding paper that they used HOMER for generating the bedGraph: makeUCSCfile out_dir –o out.bdg –name sample_name -color track_color –fragLength 150 –avg -fsize 1e20

I am quite new to ChIP-seq analysis and need some advice here on how to proceed. My thoughts are:

a) Use MACS2 bdgbroadcall using the bedGraph as input. Problem: bedGraph has not been generated with MACS/MACS2.

b) Use ScoreMatrixBin from R (genomation package) to get an score on all promoters and then apply a cutoff to identify peaks (in the original paper, they proceed like this). This may be a naive approach....

c) Try to obtain the ChIP-seq raw data and repeat the analysis.

Please, could you give me some advice? Any other proposal is welcome.

Thanks in advance!


ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 6 months ago by mmaqueda0

Can you add some details? Against which background you aim to show that certain promoters are enriched? Can you line that paper?

ADD REPLYlink written 6 months ago by ATpoint26k

Hi ATpoint,

In the same GEO repository, there is an input (bedGraph format) for the same experiment. So I could use this info as background to compare scores.

You can find the paper in:

ADD REPLYlink written 6 months ago by mmaqueda0

I would always download raw data, see Fast download of FASTQ files from the European Nucleotide Archive (ENA) and sra-explorer : find SRA and FastQ download URLs in a couple of clicks. Then call peaks with macs2 and its broad peak option against the control and intersect the peaks with the promoter coordinates.

Fyi, a bedGraph is a (in most cases) genome-wide intensity track that tells you how many reads mapped to any location across the genome. For peak calling it is commonly not used but rather for visualization in a genome browser or to extract information to be used to make profile plots or similar.

ADD REPLYlink written 6 months ago by ATpoint26k

Thanks for your help ATpoint!

Yes, I agree with you about working with the raw data in these cases. I guess that is the best option. Thanks for your explanation about bedGraph, I did not know before this task but realize it once I started working with the file.



ADD REPLYlink written 6 months ago by mmaqueda0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour