Does it make sense to consider a genomic loci when there is no peak but high signal?
1
1
Entering edit mode
2.5 years ago
venu 6.8k

Hi all,

The question is related to ChIP-seq, whether we should (at least in my case), consider a genomic loci with high IP signal over background even though there is no peak identified?

Following is the situation in one of my ChIP-seq analyses, in following figure, panel A, I found peaks (co-bound) for proteins (P1, P2, P3) before and after treatment. There is another class of peak set, panel B, which were identified to be only bound by P3. But as you can see, for P2, we still have good amount of signal at these regions. Y-axis is log2 ratio of IP over background.

It is difficult to explain, example for P2, at a log2 ratio value of 1 or 1.5 there is a peak in panel A, but not in panel B. But the biological evidence is almost always P3 is co-bound with P2. I was just wondering if I just proceed with the analysis with the current peak set, I would miss a lot of important genes. On the other hand, there is no (to my knowledge) best way to address this issue.

At this point, I am considering to

1. Calculate the average fold change (or log2 ratio compared to background) at which a real peak is identified for proteins
2. If peak sets from panel B show fold change/log2 ratio for protein P2 above the fold change identified from step 1, I will merge them to panel A.

I used MACS2 to identify peaks with genomic background and bamCompare for calculating log2 ratios over background

Any suggestions or ideas would be greatly appreciated.

Thanks!

ChIP-Seq • 647 views
1
Entering edit mode
2.5 years ago

Peak calling tends to vary so much from sample to sample that for cases like this, I usually end up deriving a consensus peakset for all samples and getting signal in that peakset across all samples before comparing across them. DiffBind is one such package that makes this a lot easier than doing it manually. Alternatively, you can forego peak calling altogether and try something like csaw to identify regions with different signal across your samples - though it can also be used with peaks and some people say it's easier to use than DiffBind, though I've found the opposite to be true personally.

You can also try a more sensitive peak caller, something like SPAN, which actually allows you to visually identify a few peaks in tracks that are then used to create a model for the peak calling to be used in that sample. It's particularly helpful for samples with a low signal-to-noise ratio.

0
Entering edit mode

Hi Jared, thanks for the answer.

One more thing I didn't mention is

• We do not have replicates (I know :( )
• The figures shown are actually from the differential results of MAnorm

MAnorm produced these regions as not significantly different before and after treatment (which we are interested in). However, within these peaks I observed these strange behaviour for some regions with high signal but not called as peaks. I will check out csaw and SPAN.

0
Entering edit mode

Ah, MAnorm isn't bad either. For single replicate comparisons, it's as good a tool as I've seen. Sounds like peak calling may be your issue. What parameters are you using with MACS? You can try lowering the q-value threshold and decreasing the lower bound on the mfold setting, which should increase sensitivity but also increase false positives. Since you're running MAnorm afterwards, I wouldn't be quite as worried about the false positives given that you're really only interested in those that differ between your conditions.

0
Entering edit mode

I was also thinking of lowering qvalue cut off (I've used 0.01). Co-occupancy is one of the strongest point for whole work, that's why I am trying not to loose any of the sites/genes which show actual signal. Do you have any comments on the points I mentioned in the main question about merging based on log2 ratio over backgrounds?

0
Entering edit mode

And by lowering, I guess I really meant raising the q-value threshold (maybe try 0.05?). I'm not sure what you mean by "merge them to panel A" in your main question. You mean remove them from the P3 specific list? There could be a lot of reasons you're seeing what you're seeing.

A few things to consider:

• What percentage of the P2/P3 peaks do co-bind? Is it a significant portion or not even close to what you'd expect biologically?
• What are the chances of acquiring more data? A few replicates would make your analysis a lot easier and obviously more robust.
• Probably the most important point - ratios can be misleading, particularly for regions with low signal. I expect this is the case in your panel B especially. Just because a ratio is "high" doesn't mean it's a peak in this case, just that the ChIP sample has more reads than the input. Maybe the input has 3 reads and the ChIP has 9. Would you consider that a peak? Probably not, despite it having a decent ratio. I'd compare input-subtracted RPM normalized read counts for the P3-specific peaks between the samples, which would show more absolutely whether your co-occupancy analysis is performing as expected.
0
Entering edit mode

Yes, by merging, what I meant is basically adding some peaks from panel B to panel A.

• From standard analysis, that is already significant (and also tested experimentally).
• This is a good question. We can't produce biological replicates (because of the lack of different models with same genetic background). We can do some technical replicates but I don't know how much additional information they would add.
• Actually, these ratios are produced based on normalized counts, background and ChIP are independently normalized for their sequencing depths and ratios were calculated for each binding site. I guess, this would be good measure to see whether do we see a real signal/more enrichment in ChIP, no? (Am I missing something about RPM normalization this case?). I also downsampled all BAM files to contain equal number of reads prior to all analysis.

We have no doubt there is significant co-occupancy as expected. But the difficulty is in when looking at P3 specific peaks and saying only P3 is bound here but when we compare both A and B figures, a region for P2 is called as peak in panel A at a value of 1.5 but not called as a peak in panel B even at 2.5. Of course, it is not as simple as it looks but to a biologist it would make much more sense to merge some of sites from B into A. And I'm really not able to convince him that these are not confident/due to noise/some other reason :)

0
Entering edit mode

Right, the ratios might be based on those counts, but ratios are often misleading. I'd try making the same panels with the log2 normalized counts - I think this will be more convincing than the ratios. Merging sites from B into A selectively could be done, but you need to be careful to set strict rules you stick to.

0
Entering edit mode

Thanks. I will definitely check the log2 normalized counts.