Overlapping regions in a BedGraph file
2
1
Entering edit mode
5.8 years ago
Sam ▴ 90

I downloaded a GRO-Seq dataset fromĀ http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38140 and got a bedGraph file that's 4.2gb large.

I wanted to generate heatmaps of sense and antisense strands using deeptools, which requires a bigWig file format. I figured it'd be pretty simple using a bdg2bw script found hereĀ  but I recieved this error:

$ /Users/Carlos/Dropbox/bdg2bw Gro-Seq.bdg /Users/Carlos/Desktop/Projects/GenomicDataPublication_Project/Gencode/hg19.chrom.sizes 

Overlapping regions in bedGraph line 44 of Gro-Seq.bdg.clip

Not quite sure what to do to fix this. I know it's pretty simple in concept, just remove overlapping regions. But i'm not sure how exactly to go about doing that or if it would bias the results or something just as terrible, of the heatmap if I did.

Any ideas?

bedgraph GRO-Seq ChIP-Seq deeptools • 5.6k views
ADD COMMENT
0
Entering edit mode
5.8 years ago

I haven't merged it into the master branch yet, but you can make bigWig files with pyBigWig in the WriterIntegration branch. That'd allow you to script something to fix the problematic line(s).

The fix would be something along the lines of:

import pyBigWig
lastLine = None
bw = pyBigWig.open("something.bw", "w")
bw.addHeader(some stuff)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            #Do something appropriate
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

That'd give you a bigWig file and allow you to fix overlapping entries however you'd like.

ADD COMMENT
0
Entering edit mode

This looks promising. I'm not very (or at all) versed at scripting, but i'll give it a whirl and see if I can figure it out.

ADD REPLY
0
Entering edit mode

Go ahead and post again with what you have if you get stuck.

ADD REPLY
0
Entering edit mode

Not surprisingly I got stuck quite quickly. Here's the edited script: (if you can call it that)

import pyBigWig
lastLine = None
bw = pyBigWig.open("Gro-Seq.bdg", "w")
bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            fout.write(line)
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

The error I'm receiving:

$ python bdgintobw.py

Traceback (most recent call last):

  File "bdgintobw.py", line 4, in <module>

    bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)

AttributeError: 'NoneType' object has no attribute 'addHeader'
ADD REPLY
0
Entering edit mode

For whatever reason you couldn't open "Gro-Seq.bdg" for opening (are you using the "WriterIntegration" branch?). Also, I apparently introduced a bug recently where "maxZooms" doesn't work! I'll try to get that fixed today.

ADD REPLY
0
Entering edit mode

maxZooms should now work as either a keyword or optional argument, though obviously if open() returns None then you're having a different issue.

ADD REPLY
0
Entering edit mode

Hi, Is there any update about this post? Thanks. T.

ADD REPLY
0
Entering edit mode
2.4 years ago

This solution only works for non-Bookended bedgraph file
Sort and then merge overlapping regions(mean of score or max, sum, etc)

LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s in.bdg > out.bdg.tmp
bedtools merge -i out.bdg.tmp -c 4 -d 0 -o max > out.bdg
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s out.bdg > out.bdg.sorted
bedGraphToBigWig out.bdg.sorted hg38.genome out.bigwig
ADD COMMENT

Login before adding your answer.

Traffic: 2777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6