10.6 years ago by

University Park, USA

Hi,

this is a problem that I am actively investigating. I have come up with a potential (homegrown) approach but it has has not been fully vetted, so keep that in mind. It builds on the following assumptions

- We assume that the position of each peak is defined independently of the rest
- Within one peak the distribution of the reads is governed by a reasonably normal distribution

Thus if we could detect each peak, find the corresponding peak in the other dataset, extract only the reads that correspond to both of the these peaks, then we can run a statistical test to detect differences between these distributions.

The results will characterize each peak individually rather than the entire shape. These differences may manifest themselves as a difference in the mean or variance of peaks. (the first indicating a shift of the peak, the other is a change in occupancy). For example below are the results from a script that I wrote that compares peaks around TSS for two experiments:

The upper panel shows the original peaks, the lower panel shows the underlying read distributions, the little boxes below show the shift and p-values respectively.
The interpretation is that the last 2 peaks show a statistically significant shift in the mean value of 10bp and +20 bp respectively.

I do have a tool that does this pretty automatically but since I am not yet convinced of the correctness of the approach as a whole it is not yet publicly available.

Not so long ago I was advised that this is a problem can be thought of a time series analysis but have not yet looked into this possibility. That is something to also investigate.