Question: How to tell if a BigWig file is 1-based or 0-based?
1
gravatar for i.sudbery
15 months ago by
i.sudbery9.1k
Sheffield, UK
i.sudbery9.1k wrote:

The BigWig documentation on the UCSC website says the following:

BigWig files created from bedGraph format use "0-start, half-open" coordinates, but bigWigs that represent variableStep and fixedStep data are generated from wiggle files that use "1-start, fully-closed" coordinates. For example, for a chromosome of length N, the first position is 1 and the last position is N.

But if I download a file, I might not know what co-ordinate system it is encoded with. I'm using bx-python to access the data in the bigWig files, but I can't work out if its returning 1-based or 0-based coordinates. Is there a way to tell? Genome browers must be able to tell the difference.

coordinate systems bigwig • 675 views
ADD COMMENTlink modified 15 months ago by Alex Reynolds30k • written 15 months ago by i.sudbery9.1k
2
gravatar for Alex Reynolds
15 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

I had asked UCSC a similar question some years ago, and their answer suggests to look at the header information, or to find the provenance of the data going into creation of the original bigWig:

The original data that is used to generate a bigWig can come from different formats. There is bedGraph, which is zero-relative, and wiggle, which is 1-relative. In summary, if a bedGraph is used, the results from bigWigToWig will be the bedGraph zero-relative coordinates. What will be included in the output is a commented note, for example, "#bedGraph section chr1:10451-568419" at the head of the wgEncodeSydhTfbsK562Pol3StdSig file mentioned. Thus, the data is not re-indexed, unless you specify bigWigToBedGraph, then data will always return as 0-based bedGraph.

Most ENCODE data, such as the information you were looking at, originated from a bam, that was processed through a step like bamToBedfile.bam -> file.bedGraph bedGraphToBigWig -> file.bw Thus, there is no problem with this file, it should be what you see when looking at most bam originated bigWig files from the ENCODE project.

As to your last question, it is best to not rely on the fact all bigWigs will be indexed the same, some will be from bedGraphs, some from wigs, depending on their originating files, but likely all ENCODE data will exit bigWigToWig as bedGraphs since they were likely encoded as bedGraphs from bams.

Here is further background information. There are two bigWig encoders, bedGraphToBigWig and wigToBigWig, that can take bedGraph or the two wiggle types, variableStep and fixedStep. Then there are two ways back: bigWigToBedGraph and bigWigToWig. If you wish to explore with these formats, please see these pages, the last being the location for obtaining precompiled binaries:

ADD COMMENTlink written 15 months ago by Alex Reynolds30k

Thank you for this information, I have this exact same problem. Do you know of a tool to access the header of a bigWig file?

ADD REPLYlink written 3 months ago by Papyrus370
1

Using Devon Ryan's Python library may help ( https://github.com/deeptools/pyBigWig ). Once installed:

$ python
>>> import pyBigWig
>>> bw = pyBigWig.open("my.bigWig")
>>> print(bw.header())
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Alex Reynolds30k

Thanks you! does this require loading the whole file? (in this step bw = pyBigWig.open("my.bigWig"), sorry for the question, I have no experience in python) I was looking for something like samtools view to pipe to head, but for bigWig, so that I can avoid loading the file

ADD REPLYlink written 12 weeks ago by Papyrus370
1

Not sure about the answer to your first question, but the second seems straightforward. Create a text file called readBigWigHeader.py and add the following code or similar:

#!/usr/bin/env python
import sys, pyBigWig
fn = sys.argv[1]
bw = pyBigWig.open(fn)
sys.stdout.write("{}\n".format(bw.header()))

Make the script executable (chmod +x ./readBigWigHeader.py), then run it like so to get the header sent to the standard output stream:

$ ./readBigWigHeader.py my.bigWig
...
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Alex Reynolds30k

Ok, thank you for the comprehensive help

ADD REPLYlink written 12 weeks ago by Papyrus370
1

It doesn't read the whole file in, it just reads in the parts needed like samtools. Please note that there's nothing in the header that indicates whether the underlying data is 1 or 0-based. This can actually change per-chunk within a bigWig file so there's really nothing to look at to know. As a general rule of thumb, it's best to assume that bigWig files are 0-based, since 1-based bigWig files are a terrible idea that should never have been allowed.

ADD REPLYlink written 12 weeks ago by Devon Ryan96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour