Question: ChIP-seq Peak Calling/File Format
gravatar for lkalesin
4 weeks ago by
lkalesin0 wrote:

Hi all! I am trying to get ChIP-seq peaks from ENCODE ChIP-seq data. The particular experiment I am interested in is GSM613815. When I download the .bed files from GEO, however, I get a lines that look like this:

chr1 9859 10058 SOLEXA5_123:3:23:15452:1914

Unfortunately, this does not have scores, names, strands, etc according to the .bed file format, like so:

chr1 91852645 91853203 SRX005383.05_peak_1 612 . 17.40168 67.74557 61.27857 379

How would I use the information in the first file to get peaks I can use (second line)? Is it a conversion or do I have to do anything else?

ADD COMMENTlink modified 4 weeks ago by Friederike4.1k • written 4 weeks ago by lkalesin0
gravatar for ATpoint
4 weeks ago by
ATpoint16k wrote:

I think what you have there is simply the sequencing reads in BED format, even though note that this is not standard BED because strand would need to be in column6 instead of column5. To make a proper BED file, do something like:

awk 'OFS="\t" {print $1, $2, $3, $4, ".", $5}' in.bed > out.bed

This file you could use to call peak e.g. with macs2 -t out.bed -f BED.

ADD COMMENTlink written 4 weeks ago by ATpoint16k
gravatar for Friederike
4 weeks ago by
United States
Friederike4.1k wrote:

I don't think you downloaded the peaks, as ATpoint mentioned, these are probably bed files of reads ("TagAlign"). The peaks from ENCODE are usually supplied in .narrowPeak files. Maybe try the roadmap website for downloading the peaks (it's subheader "C. peak calling", make sure to scroll down).

ADD COMMENTlink written 4 weeks ago by Friederike4.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1441 users visited in the last hour