Question

Comparing numbers of ChIP-seq peaks between different sample types

9

Entering edit mode

8.0 years ago

nash.claire ▴ 490

Hi everyone,

I have a bit of an odd question which admittedly has come more from my PI than me.

We have done some ChIP-seq analysis for a TF on some tissue and cell samples. I have analysed all of these samples with the exact same pipeline and for each sample, I have run MACS2 to identify peaks against an input control for each sample. What I want know is, is it fair to say if I have more peaks called in one sample versus another cell sample that the sample with higher peak number has more TF binding to DNA than the other sample? For example, I have sample A with 3900 peaks, sample B with 11000 peaks and sample C with 19000 peaks. In order to say that my sample C has more TF binding to the DNA than the other samples, is there any sort of normalisation I should do? Are these numbers comparable to each other as is? I've read a lot about normalising peak heights between samples (using library size etc) but this is a different question. Can numbers of peaks themselves be compared between samples??

ChIP-Seq • 7.2k views

ADD COMMENT • link 8.0 years ago by nash.claire ▴ 490

score 7 · Answer 1 · 2016-04-07

This is an interesting question that I don't have the complete answer to, but would be interested in hearing from someone with more experience than I. My original answer would be no, but only because peak calling is extremely biased and depends heavily on sequencing depth, input sample, perfect technical replication, and other things that we can't always account for.

In order to get you started on the right track though, here are a couple of considerations.

Are your samples sequenced to the same sequencing depth in all your tissue and cell replicates?
Are the number of reads sequenced similar between all tissues and cell replicates?
Are the number of peaks always higher in Cell C than Cell A in all your replicates?
Have you tried using another peak caller other than MACS2 and comparing the number of peaks using that peak caller?

I think this type of question would rely heavily on more knowledge of the Biology than I have of DNA binding and in particular your TF of interested (some TF's may be more enriched in some cell types due to a variety of reasons), and having a good number of biological replicates to account for simple chance.

score 7 · Answer 2 · 2016-04-08

I think that is a very relevant (and common) question to ask - not really odd at all. There is in my opinion too much focus on the absolute number of peaks, and the questioning you do is healthy.

I agree with Sinji, that formally you can't conclude that. As mentioned then the number of peaks varies with sequencing depth. We actually did a systematic test of that here in figure 3e, where we randomly downscaled H3K4me3 and input from 26M reads to 2M reads in steps of 2M reads and did peakfinding for all combinations in MACS1.4, MACS 2.0 and EaSeq. As expected the number of obtained peaks varies a lot depending on the number of reads, where more reads generally tend to improve your signal/noise level and allow more peaks to be identified with higher reliability. Interestingly, the number of found peaks scaled quite differently with the number of reads in MACS and EaSeq, and this will likely also be the case if you test other algorithms. The variation in the number of peaks found with the different algorithms is also quite evident in our Figure 3a and in this paper.

Finally, even if you took the dataset sizes into account and matched them perfectly, then I agree with Devon that many parts of the experimental conditions can vary a lot, and that the variation between ChIP-seq replicates can be quite high, so it would still be impossible to make that conclusion formally. Nonetheless, I think that you do see people make the conclusion that a difference in peak numbers indicates different binding - and in many cases it is probably also true. But there are a lot of unknowns, where we don't even know their extent - and cannot reliably measure how much they will affect your conclusion.

If the question is central to your work then I would:

Make biological replicates - triplicates preferably, but you PI might not agree on that :-)
Ideally scale the datasets to the same size before peakfinding, and see if the difference in peak numbers is reproduced. Although, that does still not rule out the effects that Devon mention on cell line genomic sequence identity & copy numbers. However, if your samples are differently treated cells of the same origin, then this should not pose an issue and the variation boils down to 'only' being biology, IP-efficiency, and library creation, which your independent biological triplicates will give you an idea of.
Use one of the replicates for each condition for peak-calling, and the other(s) for quantifying signal strenghts at the peaks in the different samples. Then you might get an indication of how much of the signal that is rediscovered in the different samples for each peakset. Using the same samples for identifying peaks and quantifying signal will lead to a bias in the quantitation.
Make sure that the central conclusions are supported by independent methods as well. ChIP-seq is absolutely not flawless.

score 2 · Answer 3 · 2016-04-08

Wow thank you very much for your informative answers. I'm going to feed all this back to my PI and see how much he'll let me do!

Since we are really wet lab biologists, my preference is to probably try some experiments to prove this. Just like you mentioned above, there are probably too many unknowns within the whole ChIP-seq process before we even get to analysing the sequencing data to form any really reliable conclusions. Time to plan experiments I think!

I will however go through the process of normalizing the data as usual for sequencing depth etc and I'll try a couple more peak finding algorithms to see if the "trend" is still there (although I'm going to predict not haha)

Thanks again everyone.