How To Read A Bigwig File Using Python
3
0
Entering edit mode
12.1 years ago
Fidel ★ 2.0k

Hi everyone,

I am using bx-python to read a bigwig file. However, the current implementation is very slow. A single query in my benchmarks takes about 0.02 seconds in a 2.5Ghz server and I need to run thousands of queries. Through parallelization and other tricks I can read the bigwig faster but I wonder whether anyone knows other library or means to query a bigwig file using python? The perl library to read bigwig files is very fast but I will prefer a python solution.

Update to clarify the problem:

"""
usage: %prog bigwig_file.bw  < bed_file.bed 
"""
from bx.intervals.io import GenomicIntervalReader
from bx.bbi.bigwig_file import BigWigFile
import numpy as np
import time
import sys

bw = BigWigFile( open( sys.argv[1] ) )
ll = []
for interval in GenomicIntervalReader( sys.stdin ):
    start = time.time()
    bw.query(interval.chrom, interval.start, interval.end, 20 ) 
    total = time.time() - start
    ll.append(total)

print np.mean(ll)

This python script will print the average time of each call. For a bed file containing thousands of lines this code may take half an hour while a Perl counterpart using the Bio-BigFile library takes only few seconds.

python bigwig • 12k views
ADD COMMENT
1
Entering edit mode

Which perl library are you using?

ADD REPLY
1
Entering edit mode

Hi Marcin, I profiled the python code and for each call to the BigWigFile query() method there are 10.000 calls to the read_and_unpack() method. This seems to be causing the slow down. I reported the issue to the developers because I could not solve the problem by analysing the code (see https://bitbucket.org/james_taylor/bx-python/issue/38/read-a-bigwig-file-is-slow).

ADD REPLY
1
Entering edit mode

Looks like the look in query() is not cythonized, that might help: https://bitbucket.org/james_taylor/bx-python/src/38dc8eb987fb/lib/bx/bbi/bbi_file.pyx#cl-215

ADD REPLY
0
Entering edit mode

Just to be sure you are saying that random-access of a BigWig in bx-python is considerably slower than Perl? Maybe profile the python code and see were the time goes.

ADD REPLY
2
Entering edit mode
10.5 years ago
tszn1984 ▴ 100

I write a BigWigFile class by wrapping the Kent's lib. It very convenient to use and very very fast. The manuals is here: http://tsznxx.appspot.com/BigWigFile

The source code can be downloaded from github: git clone git://github.com/tsznxx/wLib.git wLib. You just need to (1) go into the external/Kentlib and make, (2) go into wWigIO, change the python.h path to your python.h location, and run make. The wWigIO.so and BigWigFile.py are what your want.

Please contact me if you have any questions.

ADD COMMENT
0
Entering edit mode
12.1 years ago
brentp 24k

If you're doing random access, try the Cython version. Examples here

I think you can do:

bw.query("chr1", 10000, 20000, 1)

To get all the features on chromosome 1 between 10000 and 20000.

ADD COMMENT
0
Entering edit mode

This is exactly what I am doing but each call to the query method is rather slow.

ADD REPLY
0
Entering edit mode
12.0 years ago
Ryan Dale 5.0k

Reading the source, it appears that query() is taking the results from summarize(), which returns an object containing NumPy arrays. query() then iterates over those results and sort of re-packages them into a dictionary.

Is it fast enough for your purposes if you use the summarize() method directly? I just did some quick benchmarks and, while it greatly depends on the underlying data and the query, summarize() can be up to 3x faster than query().

ADD COMMENT

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6