Question: Obtain ChIPseq peaks from a bedgraph file with continuous regions
0
gravatar for mmaqueda
6 months ago by
mmaqueda0
mmaqueda0 wrote:

Hi all,

I have a bedGraph file from a ChIP-seq experiment (H3K27me3) downloaded from GEO (GSE84324). I want to obtain a list of genes with this mark enriched. However, the bedGraph shows a continuous signal with adjacent regions while I was expecting selected regions where I just needed to annotate genes.

I've read in the corresponding paper that they used HOMER for generating the bedGraph: makeUCSCfile out_dir –o out.bdg –name sample_name -color track_color –fragLength 150 –avg -fsize 1e20

I am quite new to ChIP-seq analysis and need some advice here on how to proceed. My thoughts are:

a) Use MACS2 bdgbroadcall using the bedGraph as input. Problem: bedGraph has not been generated with MACS/MACS2.

b) Use ScoreMatrixBin from R (genomation package) to get an score on all promoters and then apply a cutoff to identify peaks (in the original paper, they proceed like this). This may be a naive approach....

c) Try to obtain the ChIP-seq raw data and repeat the analysis.

Please, could you give me some advice? Any other proposal is welcome.

Thanks in advance!

Maria

ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 6 months ago by mmaqueda0

Can you add some details? Against which background you aim to show that certain promoters are enriched? Can you line that paper?

ADD REPLYlink written 6 months ago by ATpoint26k

Hi ATpoint,

In the same GEO repository, there is an input (bedGraph format) for the same experiment. So I could use this info as background to compare scores.

You can find the paper in:

http://dev.biologists.org/content/145/6/dev163162

ADD REPLYlink written 6 months ago by mmaqueda0
1

I would always download raw data, see Fast download of FASTQ files from the European Nucleotide Archive (ENA) and sra-explorer : find SRA and FastQ download URLs in a couple of clicks. Then call peaks with macs2 and its broad peak option against the control and intersect the peaks with the promoter coordinates.

Fyi, a bedGraph is a (in most cases) genome-wide intensity track that tells you how many reads mapped to any location across the genome. For peak calling it is commonly not used but rather for visualization in a genome browser or to extract information to be used to make profile plots or similar.

ADD REPLYlink written 6 months ago by ATpoint26k

Thanks for your help ATpoint!

Yes, I agree with you about working with the raw data in these cases. I guess that is the best option. Thanks for your explanation about bedGraph, I did not know before this task but realize it once I started working with the file.

Regards,

Maria

ADD REPLYlink written 6 months ago by mmaqueda0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour