Question: ChIP-seq analysis - More peaks in IP samples without activating the TF
gravatar for avital1005
3.0 years ago by
avital10050 wrote:

Hi, We are looking at the binding sites of TF that is not in the nucleus prior to its activation and one hour after giving a hormone treatment to the cells the TF is in the nucleus and can bind to DNA. We did ChIP-seq for both before and after the hormone treatment and our input is from the cells with the hormone. When I'm using MACS2 to call peaks I'm getting a lot more peaks in the non-treated cells (40000 compare to only 500 in the treated cells), which doesn't make any sense because the TF is not supposed to bind to DNA in the untreated cells. In addition, we know what is the motif of the factor and most of the peaks in the untreated cells don't have the motif (4%) while 70% of the 500 peaks from the treated cells do. My guess is that the peaks are some phantom peaks or noise, but why do I still detect them with MACS analysis and how can I remove these sort of peaks or "clean" my peaks so I can believe the peaks that I'm getting from the treated cells?

If you have any other suggestion to why I'm getting this odd results from the untreated cells I would also like to hear. I'll just mention that I have 15M reads in the treated cells and in the input and 8M reads in the untreated cells.


sequencing chip-seq next-gen • 1.1k views
ADD COMMENTlink written 3.0 years ago by avital10050

The only real way to get to the bottom of this is to look at the area around a few of the peaks in your untreated samples in IGV or something similar. My guess is that you had more PCR cycles in your untreated samples so you have "blocky" alignments due to low sequencing complexity. That would then correspond to excessive peaks in the untreated sample.

ADD REPLYlink written 3.0 years ago by Devon Ryan95k

Thank you, you are right, when I'm looking in IGV I can see that I have a lot of reads in the same place and they are not distributed nicely and some of them are the same reads (only40% of reads are not duplicates). I have a question though, MACS2 is removing duplicates reads before the peak calling no? So why still this overamplified regions, with the same reads are called as peaks?

ADD REPLYlink written 3.0 years ago by avital10050

Even with duplicates removed, if you have large areas with little/no coverage then any area with even a bit of coverage will look like a peak.

ADD REPLYlink written 3.0 years ago by Devon Ryan95k

I know for macs1, if you have IP and Input with different amount of reads, it will downsample the higher reads sample to the lower reads sample.

if you only have 8M in the untreated cells, the input (15million) will be downsampled to 8 million as well. This might affect your peak calling results. And if you expect not to see binding in the untreated sample, the low number of reads maybe inherent. Just my 2c.

ADD REPLYlink written 3.0 years ago by Ming Tang2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour