Identifying The 'Summit' Coordinate Within Many Coordinates In A Wig File?
2
2
Entering edit mode
13.0 years ago
Ian 6.0k

I have a WIG file containing read coverage data from ChIP-seq peak calling analysis.

For a given set of coordinates is there a simple fast way of determining the 'summit' coordinate, i.e the coordinate with the highest read pileup.

My own method would be slow as i would run through the (big) WIG file, extract the region of interest and find the coordinate with the highest score. But surely there must be a quicker way of doing this? If not fair enough.

The peak caller does report the summit from its own data, but i want to check out different regions.

Thanks.

wiggle • 5.5k views
ADD COMMENT
3
Entering edit mode
13.0 years ago

Convert your Wig file to BigWig with the UCSC executable wigToBigWig. This provides an indexed Wig file you can query to return regions of interest, without having to loop over the entire file each time.

bx-python provides a python BigWig interface that will help with finding your summit coordinates:

from bx.bbi.bigwig_file import BigWigFile

f = open("your_file.bw")
bw = BigWigFile(file=f)
vals = bw.get("chr3", 1, 1000)
# vals is a list of start, end, value
max_start, max_end, max_val = max(vals, key=lambda x: x[-1])

You'll want your logic to be smarter to handle multiple summits of the same size but hopefully that helps to get started.

ADD COMMENT
0
Entering edit mode

Looks interesting! Shame its in python though as i am a Perl'ist :)

ADD REPLY
0
Entering edit mode

You probably want Lincoln's Bio-BigFile then.

ADD REPLY
3
Entering edit mode
13.0 years ago

convert your wig file to bigwig using wig2bigwig and use bigWigSummary to extract the information in a given region.

   bigWigSummary file.bigWig chrom start end dataPoints

Get summary data from bigWig for indicated region, broken into
dataPoints equal parts.  (Use dataPoints=1 for simple summary.)

options:
   -type=X where X is one of:
         mean - average value in region (default)
         min - minimum value in region
         max - maximum value in region
         std - standard deviation in region
         coverage - % of region that is covered
ADD COMMENT
0
Entering edit mode

Thanks. I was aware of this, but it only returns the value, not the coordinate.

ADD REPLY

Login before adding your answer.

Traffic: 2085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6