Calculate P-Values For Genomic Regions Using Sliding Window Method
3
0
Entering edit mode
9.9 years ago
Sandeep ▴ 260

Hi All,

I am working on a few genomic regions of interest and have their quantile normalized M values of all the probes lying in that particular region. The data is obtained from Agilent 244k CpG island array. The average distance between the each probes is 100 bases.

I am trying to extract statistically significant regions that takes into account the M values of neighboring probes. I have pasted sample dataset below

ProbeName    sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8    sample9    sample10    chr    start    end
probe1    -0.532    -0.923    0.402    0.503    -0.322    0.315    0.250    -0.498    -0.178    -0.667    chr1    884379    884423
probe2    0.808    -0.550    -0.315    -1.159    -0.659    -0.255    -0.100    -1.198    -0.991    -0.686    chr1    886633    886677
probe3    0.593    0.783    0.741    0.113    0.428    0.540    0.689    1.119    0.184    0.268    chr1    886707    886751
probe4    1.378    0.695    0.312    1.710    1.284    -0.619    1.331    1.121    1.502    1.517    chr1    887101    887145
probe5    -0.089    0.559    0.636    0.165    1.225    0.416    0.426    -0.453    1.260    0.205    chr1    887255    887299
probe6    0.786    0.620    -0.267    0.214    -0.320    -0.419    0.290    -0.375    -0.419    -0.390    chr1    887342    887386
probe7    -0.533    -0.085    -0.118    -0.042    1.008    -0.171    -0.015    -0.567    -0.497    0.093    chr1    887488    887532
probe8    0.551    1.018    1.793    -0.094    0.407    1.319    1.840    0.429    2.430    0.585    chr1    887598    887642
probe9    0.064    0.772    -0.348    -0.602    0.544    -0.841    -0.082    -1.362    -1.147    -0.627    chr1    887830    887874
probe10    -0.334    0.258    0.128    0.674    0.848    0.142    0.402    0.517    0.522    0.629    chr1    888033    888077


Is there any pre-existing tool or script that uses a sliding window approach and calculates the significance of the probes as well as the regions in question for a custom region?

Thank you

array • 3.1k views
4
Entering edit mode

A test for significance? What are the null- and alternative hypothesis?

1
Entering edit mode

The coverage of probes on Agilent 244K CpG island array can be binned according to CpG Island. The experiment performed here is based on MeDIP experiment where in, the average fragment size varies from 100 - 800bp covering more or less 4 - 5 probes. Hence, I was wondering if by providing a definite window size would it possible to find statistically significant enriched regions, whose M-values are positive in that region for given group or set of arrays? The null hypothesis here was the extent of methylation is equal to the M values of the particular probe in question. But, I was not sure what statistics to apply and how.

Looks like les package as suggested by ff.cc.cc is able to do what is necessary.

4
Entering edit mode

Sliding windows are not independent this causes a serious problem.

2
Entering edit mode
9.9 years ago

There are a number of tools available for looking at differentially methylated regions using MeDIP-chip (It appears that that's what you're wanting to do). These include Ringo, Batman, CHARM, and QDMR. There are probably others and I don't recall if all of those are appropriate for your array, but those are enough to get you started.

0
Entering edit mode

Thank you for the suggestion. I had looked into Ringo and Batman. QDMR look s interesting and will further explore it. As of now, the R package les seems to do what is necessary in this case.

2
Entering edit mode
9.9 years ago
ff.cc.cc ★ 1.3k

Hi,

I'm not a fanatic of sliding window techniques, but here I point out a good example of its usage (within analysis of cpg probes):

Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers

.

If you like the R framework, les is a package that can do the work you need (it was created with tiling array in mind).

0
Entering edit mode

Thank you very much. les package looks promising. Will be trying it now.

0
Entering edit mode
9.9 years ago
Sandeep ▴ 260

Thank you all for wonderful suggestions. Finally I found BioTile, a PERL based tool for the identification of differentially enriched regions in tiling microarray data to be the most useful for our data. It was pretty simple to use and pretty fast as well.