Question: Binning -log10 P signal tracks
1
gravatar for Saulius Lukauskas
4.3 years ago by
London, UK
Saulius Lukauskas530 wrote:

I am looking at the recently published signal tracks from Roadmap Epigenomics project. Namely the section d in http://egg2.wustl.edu/roadmap/web_portal/processed_data.html

They suggest using their -log10 p-value track for analysis. The way the track is generated is explained here as well as in their Nature paper. MACS2 function bdgcmp seems to have been used to do the heavy-lifting (see line 217 of the script). Semantically, as described in the paper, it is supposed to give you a p-value from Poisson distribution that a signal/noise ratio is higher than the one expected from chance. I am interested in this p-value.

I want to, however, look into non-overlapping windows in the genome (i.e. bins) of some size, say 100bp.

The output produced by MACS (and thus the consortium) is of form:

chr1    0    9853    0.01005
chr1    9853    9927    0.05026
chr1    9927    9971    0.13816

Where the last column is the p-value and the first three describe the genomic interval.

Since I am interested in fixed-length bins I wonder how one would define the p-value for fixed-length bins using this data, keeping the underlying Poisson model in mind. For instance, what is the p-value associated with bin chr1:9800-9900?

A quick an dirty way is to, of course, use Ernst and Kellis approach and to "take a base-level average of signal overlapping each [25bp] bin". I wonder if this is the best approach from the statistical sense, or is there better one?

Don't think any of them would make much difference in the downstream analysis, but I am still wondering. More out of curiosity than practicality, I guess.

ADD COMMENTlink written 4.3 years ago by Saulius Lukauskas530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2204 users visited in the last hour