Question: ChIP-seq Peak Calling/File Format
gravatar for lkalesin
11 months ago by
lkalesin0 wrote:

Hi all! I am trying to get ChIP-seq peaks from ENCODE ChIP-seq data. The particular experiment I am interested in is GSM613815. When I download the .bed files from GEO, however, I get a lines that look like this:

chr1 9859 10058 SOLEXA5_123:3:23:15452:1914

Unfortunately, this does not have scores, names, strands, etc according to the .bed file format, like so:

chr1 91852645 91853203 SRX005383.05_peak_1 612 . 17.40168 67.74557 61.27857 379

How would I use the information in the first file to get peaks I can use (second line)? Is it a conversion or do I have to do anything else?

ADD COMMENTlink modified 11 months ago by Friederike5.4k • written 11 months ago by lkalesin0
gravatar for ATpoint
11 months ago by
ATpoint32k wrote:

I think what you have there is simply the sequencing reads in BED format, even though note that this is not standard BED because strand would need to be in column6 instead of column5. To make a proper BED file, do something like:

awk 'OFS="\t" {print $1, $2, $3, $4, ".", $5}' in.bed > out.bed

This file you could use to call peak e.g. with macs2 -t out.bed -f BED.

ADD COMMENTlink written 11 months ago by ATpoint32k
gravatar for Friederike
11 months ago by
United States
Friederike5.4k wrote:

I don't think you downloaded the peaks, as ATpoint mentioned, these are probably bed files of reads ("TagAlign"). The peaks from ENCODE are usually supplied in .narrowPeak files. Maybe try the roadmap website for downloading the peaks (it's subheader "C. peak calling", make sure to scroll down).

ADD COMMENTlink written 11 months ago by Friederike5.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour