Question

Correlation of two ChIP-seq signals?

1

Entering edit mode

4.4 years ago

biostart ▴ 370

Hi all,

I am looking at two continuous ChIP-seq signals (think of histone modifications), and for several randomly selected example regions I see by eye that these signals are anti-correlated (peaks of one signal at locations of deeps of another signal). But when I calculate a proper correlation genome-wide, they are slightly positively correlated.

I tried calculating the correlation both based on bins and based on specified intervals (genes only), and in both cases I get positive correlation, but I see that the browser tracks for examples of my genes of interest are anti-correlated. Any idea how to check genome-wide correlation in some other way?

Thanks!

ChIP-Seq • 1.8k views

ADD COMMENT • link updated 4.4 years ago by Prakash ★ 2.2k • written 4.4 years ago by biostart ▴ 370

0

Entering edit mode

Can you add some representative code and screenshots? With just words it is difficult to reproduce / understand what you see or think to see. By-eye interpretation is probematic. How many regions did you check by eye and and how many regions were included in the calculation, aka is your by-eye analysis representative?

ADD REPLY • link 4.4 years ago by ATpoint 81k

0

Entering edit mode

The code is quite standard, e.g. I calculated correlations between BigWig files using deepTools (both bin-based and region-based), and I also calculated it using a custom code, with the same result (positive correlation). The by-eye analysis is definitely not representative, but it's kind of also important, because I looked at three genes that change their expression the most. The lines in by-eye analysis are smoothed (20-bp running window), so I see two smooth anti-correlated profiles which have peaks/deeps, and the typical width of the peak is 1-4 nucleosomes.

ADD REPLY • link 4.4 years ago by biostart ▴ 370

0

Entering edit mode

I suspect what is happening here is that you are looking at areas of the genome where there is some signal in both tracks, and seeing that within those areas there is a negative corrlation.

However, this will be completely swamped if it turns out that the same areas have some signal in both samples. For example H3K36me (I think) is associated with transcription. If you compare two genes, you might find the pattern within the gene anti-correlated, but since genes are in the same place in both files, you'll find they are positively correlated because of this.

ADD REPLY • link 4.4 years ago by i.sudbery 19k

0

Entering edit mode

I was also thinking that this may be the case, so I narrowed down the correlation analysis only to gene bodies. As a result I get even stronger positive correlation genome-wide inside gene bodies, which still does not explain the negative correlation that I see by eye in the example genes.

ADD REPLY • link 4.4 years ago by biostart ▴ 370

0

Entering edit mode

What about exons/introns - is some correlation coming from the signal being stronger in exons?

ADD REPLY • link 4.4 years ago by i.sudbery 19k

0

Entering edit mode

I agree with @i.sudbery that the anti-correlation you saw could be swamped due to a large area chosen (or gene).

What is the bin size you are using? I think you can try to reduce bin size to avoid two negatively correlated peaks located in the same bin, or, you can do both: narrow down the analysis to the gene body and chop them into bins to see what will happen

ADD REPLY • link 4.4 years ago by Jianyu ▴ 580

0

Entering edit mode

bin size 200 bp. I previously tried narrowing down to only gene bodies (see above), which did not help. I have now also tried to narrow down the analysis only to promoters, but the result is the same as before.

ADD REPLY • link 4.4 years ago by biostart ▴ 370

0

Entering edit mode

Have you plotted the scatter plot to see the correlation (each axis represents one sample)? And which correlation method are you using? A method like Pearson correlation is easy to be driven by outliers

ADD REPLY • link 4.4 years ago by Jianyu ▴ 580

score 3 · Answer 1 · 2019-12-18

The proportion of regions what you a saying as anti-correlated might be very less compared to positively correlated regions. The correlation value will only indicate how two ChIPseq sample are correlated. What is the correlation value you are getting? If it is not more than 0.5 or 0.6 there are high chances the two sample is not correlated and you may find differential sites. In this case you may look for deferentially bound sites.