Question: Bedgraph To Bed
4
gravatar for Ying W
8.2 years ago by
Ying W3.9k
South San Francisco, CA
Ying W3.9k wrote:

Hi,

I was wondering if it was possible to convert/collapse bedgraph format files to BED format. I wish to create venn diagrams for chip-seq data.

For my 1st dataset after running MACS, I get both BED and Bedgraph format data (i changed option to get bedgraph instead of wig). However, for my 2nd dataset I only have the data in BED format.

The bedgraph files look nice for visualization on ucsc genome browser because it shows peaks but the R package ChIPpeakAnno requires BED files.

Thanks.

chip-seq conversion • 15k views
ADD COMMENTlink written 8.2 years ago by Ying W3.9k
3
gravatar for seidel
8.2 years ago by
seidel6.8k
United States
seidel6.8k wrote:

BedGraph has 4 fields (chr start stop score), and the simplest BED file has 3 fields (chr start stop), so one easy way to convert from BED to bedGraph would be to use awk:

cat file.bedGraph | awk '{print $1 "\t" $2 "\t" $3}' > file.bed

If your bedGraph file contains a header line, you can get rid of it with an inverse grep:

grep -v track file.bedGraph | awk '{print $1 "\t" $2 "\t" $3}' > file.bed

Or you can use tail to return everything but the first line (if your track line is the first line). If you want to add more fields to the bed file, you can insert them in the print statement (I don't remember the minimum requirement for ChIPpeakAnno).

ADD COMMENTlink written 8.2 years ago by seidel6.8k
3

cut -f 1-3 is your friend too.

ADD REPLYlink modified 6.9 years ago by Sukhdeep Singh9.8k • written 8.2 years ago by Pierre Lindenbaum122k
3
gravatar for Ryan Dale
8.2 years ago by
Ryan Dale4.8k
Bethesda, MD
Ryan Dale4.8k wrote:

It's not clear that a simple format conversion is what you want . . .

In your question, you ask about "collapsing" a bedGraph file. This isn't trivial, and this problem is exactly what peak-callers like MACS solve.

The bedGraph format is typically used to visualize the genome-wide "signal". If it's output from a MACS run, the bedGraph will not be just the peaks, rather, it will show the coverage of all your reads shifted by d/2. It will have many, many more lines than a BED-format "peaks file" (try running wc -l on your files to confirm).

If your goal is to get a Venn diagram to compare peaks of two treatments, then comparing a BED file with a converted bedGraph file will not result in a meaningful diagram. Instead, you'll need to re-run a peak caller to get a BED file of peaks for each treatment.

ADD COMMENTlink written 8.2 years ago by Ryan Dale4.8k

what do you mean shifted by d/2? I know it will have many more lines than the corresponding 'peaks' file but i thought it was odd that MACS would generate a bedgraph file when I could have made a bedgraph file from the bam file that I inputed into MACS

ADD REPLYlink written 8.2 years ago by Ying W3.9k

Check out Fig 1 in the MACS paper [1] and Fig 1 in the spp paper [2] -- for ChIP-seq reads, you end up with a pileup of reads on either side of a putative binding site. If you were to generate a bedGraph out of the BAM, you'd have these double peaks everywhere. MACS computes the average distance between peaks and shifts them toward each other so that the final bedGraph has peaks right where the binding sites should be.

[1] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/ [2] http://www.nature.com/nbt/journal/v26/n12/abs/nbt.1508.html

ADD REPLYlink written 8.2 years ago by Ryan Dale4.8k

I have encountered a strange problem now. I am using bedgraph files to look at things on UCSC and when I go from sorted_bam -> bedgraph (900k lines) but when i go sorted_bam -> MACS -> bedgraph (1000k lines) the file that MACS gives me is bigger and also outside of the coordinates (UCSC gives out of bounds errors) even though if I look at beginning of the SAM file the chr_size is correct (@SQ lines), maybe macs mailing list is a better place for this question...

ADD REPLYlink written 8.2 years ago by Ying W3.9k

MACS also extends by expected fragment size; not sure if it pays attention to BAM @SQ lengths when doing so. UCSC will generally tell you what line the problem is on -- so you can open the file and delete from that line to the end of the chrom and try again. MACS mailing list is pretty responsive for these sorts of questions.

ADD REPLYlink written 8.2 years ago by Ryan Dale4.8k

Thanks for the help. I just posted on their google groups page, I will probably write a script check coordinates and trim as necessary.

ADD REPLYlink written 8.2 years ago by Ying W3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2174 users visited in the last hour