Question: Get The Percentile From A Wig/Bed File
gravatar for Eric Ho
7.0 years ago by
Eric Ho10
Hong Kong
Eric Ho10 wrote:


I have got a set of signal values from different regions in a BED/WIG file.

However, I cannot easily identify if the signal is significant across the file.
One of the solution is calculating a percentile of certain signal value.

Is there any tools can calculate the percentile of a certain signal in the file?
If no, how can I identify the significance of the signal across the file?


bed wiggle statistics • 2.4k views
ADD COMMENTlink modified 7.0 years ago by Alex Reynolds29k • written 7.0 years ago by Eric Ho10
gravatar for Alex Reynolds
7.0 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

You could use a tool like BEDOPS bedmap with its --mean, --median, --max, --min, --stdev and other statistical operators to calculate a statistic over categories of regions.

Depending on the characteristics of your signal, you might ask how likely it is to get, for instance, a median score above or below a certain value from sampling random regions over a genome — your expected median signal, say — comparing that signal against what you find over regions of interest — your observed median signal.

Essentially you would write a "sampler" program that generates a UCSC-formatted BED file that contains randomly-sampled regions from your sensibly-chosen background (you might avoid sampling from repeat regions, for instance). Let's say this file is called background_regions.bed.

You also put your regions-of-interest into a second UCSC-formatted BED file, say regions_of_interest.bed.

Make sure these two files are sorted:

$ sort-bed unsorted_background_regions.bed > background_regions.bed

$ sort-bed unsorted_regions_of_interest.bed > regions_of_interest.bed

You can then use bedmap to calculate statistics over each of these two reference BED files. Use BEDOPS wig2bed and BEDOPS sort-bed to convert the Wiggle-formatted signal file into sorted BED data, piping this stream into the bedmap statement as the map data.

For example, to get expected and observed medians, pipe sorted signal into bedmap and use the --median operator over the two BED files:

$ wig2bed mySignal.wig \
    | sort-bed - \
    | bedmap --median background_regions.bed - \
    > expectedMedians.txt

$ wig2bed mySignal.wig \
    | sort-bed - \
    | bedmap --median regions_of_interest.bed - \
    > observedMedians.txt

(You could use other operators alone or in combination to calculate a statistic or score of your choice.)

From these expected and observed results you should be able to calculate a z-score and a p-value for that class of regions-of-interest. If the p-value meets some threshold, then you could argue that your regions-of-interest are suggested to be significant.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Alex Reynolds29k

I just love how transparent and clear is bedOps suite :)

ADD REPLYlink written 7.0 years ago by Sukhdeep Singh10.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2284 users visited in the last hour