Tool:Comb-P: Combining Spatially Correlated P-Values
0
10
Entering edit mode
11.6 years ago
brentp 24k

comb-p: https://github.com/brentp/combined-pvalues/

comes as a command-line application and a python library. It is useful for genome-level data which has local correlations, e.g. in methylation data where adjacent measurements are correlated because of the regional nature of methylation.

The site: https://github.com/brentp/combined-pvalues/tree/master/examples has 4 example uses.

  1. CHARM tiled methylation array
  2. DAMID (similar to ChIP-Seq)
  3. BS-Seq
  4. 450k methylation array

For BS-Seq and 450k, this is different than existing methods that find single sites (or single genes) that have differential methylation because it finds regions of difference. The regions are not static, but extend according to a specified cutoff.

The input is always a simple, sorted bed-file with additional column(s) of p-values from the statistical test of your choosing. Each row is a probe, or a CpG. The output is a list of regions that are differently methylated and an aggregated, corrected p-value is assigned to each region.

P-values are corrected using the Stouffer-Liptak method of combining p-values with weights for the correlation.

An example use would look like:

comb-p pipeline \
    -c P.Value --dist 80 \
    --seed 1e-4 \
    -p results/output.prefix \
    sorted.input.bed

This will find regions from the p-values in a column with header "P.Value" (can also use the column number) in sorted.input.bed and output regions to resuts/output.prefix.regions-p.bed. with other files also saved to files starting with results/output.prefix...

This will perform the Stouffer-Liptak correction on a single probe by looking at its neighbors within 80 bases as weighted by their observed correlation. It will then find peaks by seeding on single probes with a corrected p-value less than 1e-4 and extending as long as it finds another probe with a p-value less than 1e-4 within 80 bases. Those are the putative regions, it will then assign a single, corrected p-value to each region and report it in the output file for further filtering.

The reference is available here: http://bioinformatics.oxfordjournals.org/content/early/2012/09/05/bioinformatics.bts545.short and a (freely available) earlier revision from here: http://dl.dropbox.com/u/88537/cpv-submission.pdf

methylation • 7.5k views
ADD COMMENT
0
Entering edit mode

I have just started using this tool for analysis, and it seems to do the job - combines geographically-related sites into a single score. I've had great help and feedback from Brent, as I am having to use a Windows machine due to restrictions at my institute. Thanks for your help Brent, and the useful tool.

ADD REPLY

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6