Question: Chip-Seq, Get Highly Enriched Regions For A Certain Mark Within One Sample
gravatar for Sirus
6.7 years ago by
Sirus790 wrote:

Hi guys, When reading through many papers, I see many people people mentioning features like H3K4me1+ (or hight), H3K4me3-,.... etc, generally they don't say how they did it automatically to find these regions.

To make the picture clear, my situation is as follow:

  • I have some Chip-Seq data (One sample per mark for a certain cell line)
  • let's say for the H3K4me1 sample I already did the peak calling and I have the list of peaks and the signal intensity in each peak
  • I want to find which peaks are highly enriched (H3K4me1+) and which peaks a lowly enriched (H3K4me1-)
  • A threshold need to be defined to do that

I found this paper (Combinatorial patterns of histone acetylations and methylations in the human genome) that described a method and I don't know if I got their meaning right. I hope through this post to get you kind clarifications :) The authors say:

The modification on a promoter under consideration was deemed significant when the tag count was higher than a threshold, which was determined by a P value taken from a background model of Poisson distribution parameterized by the genome-wide tag density

does it mean to just do :

lambda = mean(tags)

Then if region X has k tag, I will just do:

pval = Poisson( X>= k; lambda)

and consider the one with significant p-values as enriched.

Or it is not the case here?

Edit: An additional question in the context of this question. For the tag enrichment plot, is it preferable to calculate it using the final BED file or using a BigWig file? when using histone marks called reads, you get large regions with one value example

  seqnames             ranges strand |     score
         <Rle>          <IRanges>  <Rle> | <numeric>
  [1]    chr18 [8118742, 8128768]      * |       7.1

so when I do the mean of the tag count it sounds more over estimated.

Thanks in advance.

enrichment chip-seq • 3.7k views
ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Sirus790

Shall we consider the length of the region? I guess the P-value should be calculated as:

Pval = Possion (X>=k; lambda*region_length)

ADD REPLYlink written 4.9 years ago by chjiao345640


A more elaborated model is in the jmosaics model

ADD REPLYlink modified 11 months ago by _r_am31k • written 4.9 years ago by Sirus790
gravatar for Istvan Albert
6.7 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

There are many different ways to compute significance thresholds, and as almost always the methods all agree where the effects are pronounced and disagree when the effects are weak. In these latter cases the errors rates are also much higher.

Here is a more recent review/summary:

Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data, PLoS Comp. Biology 2013

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Istvan Albert ♦♦ 86k

Hi Albert, Thanks for the paper :). However, they talk about differential binding and comparing peaks between different samples. In my case I have just one sample in one condition and I want to find which peaks within my sample have significant hight enrichment to classify them into highly and lowly enriched regions

ADD REPLYlink written 6.7 years ago by Sirus790

Well you have to have some type of background estimate to detect minimally significant enrichment. But after that, among those that you do find to pass the threshold the division into high and low enrichment is a bit arbitrary.

An easier approach would be to do it the other way around, identify peaks by a different attribute and see if their enrichment is different than that of another other group. If you first select by enrichment then the dimensionality of the problem is much larger as there are more causes for the same effect.

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 86k

In my case I want the threshold to be dynamic for each mark, in this way I can annotate which parts are at the same time H3K4me1+, H3K4me3-,... etc So I thought I can for each mark do the classification, and I can select any combination and just do the intersection. for the moment I am using the Poisson one :), with p-value correction

ADD REPLYlink written 6.7 years ago by Sirus790
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1275 users visited in the last hour