Question: Overlapping regions in a BedGraph file
1
gravatar for Sam
4.8 years ago by
Sam90
United States
Sam90 wrote:

I downloaded a GRO-Seq dataset from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38140 and got a bedGraph file that's 4.2gb large.

I wanted to generate heatmaps of sense and antisense strands using deeptools, which requires a bigWig file format. I figured it'd be pretty simple using a bdg2bw script found here  but I recieved this error:

$ /Users/Carlos/Dropbox/bdg2bw Gro-Seq.bdg /Users/Carlos/Desktop/Projects/GenomicDataPublication_Project/Gencode/hg19.chrom.sizes 

Overlapping regions in bedGraph line 44 of Gro-Seq.bdg.clip

Not quite sure what to do to fix this. I know it's pretty simple in concept, just remove overlapping regions. But i'm not sure how exactly to go about doing that or if it would bias the results or something just as terrible, of the heatmap if I did.

Any ideas?

ADD COMMENTlink modified 16 months ago by Simply Bioinformatics170 • written 4.8 years ago by Sam90
0
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

I haven't merged it into the master branch yet, but you can make bigWig files with pyBigWig in the WriterIntegration branch. That'd allow you to script something to fix the problematic line(s).

The fix would be something along the lines of:

import pyBigWig
lastLine = None
bw = pyBigWig.open("something.bw", "w")
bw.addHeader(some stuff)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            #Do something appropriate
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

That'd give you a bigWig file and allow you to fix overlapping entries however you'd like.

ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.8 years ago by Devon Ryan97k

This looks promising. I'm not very (or at all) versed at scripting, but i'll give it a whirl and see if I can figure it out.

ADD REPLYlink written 4.8 years ago by Sam90

Go ahead and post again with what you have if you get stuck.

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by Devon Ryan97k

Not surprisingly I got stuck quite quickly. Here's the edited script: (if you can call it that)

import pyBigWig
lastLine = None
bw = pyBigWig.open("Gro-Seq.bdg", "w")
bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            fout.write(line)
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

The error I'm receiving:

$ python bdgintobw.py

Traceback (most recent call last):

  File "bdgintobw.py", line 4, in <module>

    bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)

AttributeError: 'NoneType' object has no attribute 'addHeader'
ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by Sam90

For whatever reason you couldn't open "Gro-Seq.bdg" for opening (are you using the "WriterIntegration" branch?). Also, I apparently introduced a bug recently where "maxZooms" doesn't work! I'll try to get that fixed today.

ADD REPLYlink written 4.8 years ago by Devon Ryan97k

maxZooms should now work as either a keyword or optional argument, though obviously if open() returns None then you're having a different issue.

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by Devon Ryan97k

Hi, Is there any update about this post? Thanks. T.

ADD REPLYlink written 3.5 years ago by morovatunc460
0
gravatar for Simply Bioinformatics
16 months ago by
WashingtonDC
Simply Bioinformatics170 wrote:

This solution only works for non-Bookended bedgraph file
Sort and then merge overlapping regions(mean of score or max, sum, etc)

LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s in.bdg > out.bdg.tmp
bedtools merge -i out.bdg.tmp -c 4 -d 0 -o max > out.bdg
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s out.bdg > out.bdg.sorted
bedGraphToBigWig out.bdg.sorted hg38.genome out.bigwig
ADD COMMENTlink modified 16 months ago • written 16 months ago by Simply Bioinformatics170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2292 users visited in the last hour