Question: Correlation of two ChIP-seq signals?
gravatar for biostart
10 months ago by
biostart350 wrote:

Hi all,

I am looking at two continuous ChIP-seq signals (think of histone modifications), and for several randomly selected example regions I see by eye that these signals are anti-correlated (peaks of one signal at locations of deeps of another signal). But when I calculate a proper correlation genome-wide, they are slightly positively correlated.

I tried calculating the correlation both based on bins and based on specified intervals (genes only), and in both cases I get positive correlation, but I see that the browser tracks for examples of my genes of interest are anti-correlated. Any idea how to check genome-wide correlation in some other way?


chip-seq • 382 views
ADD COMMENTlink modified 10 months ago by Prakash2.0k • written 10 months ago by biostart350

Can you add some representative code and screenshots? With just words it is difficult to reproduce / understand what you see or think to see. By-eye interpretation is probematic. How many regions did you check by eye and and how many regions were included in the calculation, aka is your by-eye analysis representative?

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint40k

The code is quite standard, e.g. I calculated correlations between BigWig files using deepTools (both bin-based and region-based), and I also calculated it using a custom code, with the same result (positive correlation). The by-eye analysis is definitely not representative, but it's kind of also important, because I looked at three genes that change their expression the most. The lines in by-eye analysis are smoothed (20-bp running window), so I see two smooth anti-correlated profiles which have peaks/deeps, and the typical width of the peak is 1-4 nucleosomes.

ADD REPLYlink modified 10 months ago • written 10 months ago by biostart350

I suspect what is happening here is that you are looking at areas of the genome where there is some signal in both tracks, and seeing that within those areas there is a negative corrlation.

However, this will be completely swamped if it turns out that the same areas have some signal in both samples. For example H3K36me (I think) is associated with transcription. If you compare two genes, you might find the pattern within the gene anti-correlated, but since genes are in the same place in both files, you'll find they are positively correlated because of this.

ADD REPLYlink written 10 months ago by i.sudbery9.4k

I was also thinking that this may be the case, so I narrowed down the correlation analysis only to gene bodies. As a result I get even stronger positive correlation genome-wide inside gene bodies, which still does not explain the negative correlation that I see by eye in the example genes.

ADD REPLYlink written 10 months ago by biostart350

What about exons/introns - is some correlation coming from the signal being stronger in exons?

ADD REPLYlink written 10 months ago by i.sudbery9.4k

I agree with @i.sudbery that the anti-correlation you saw could be swamped due to a large area chosen (or gene).

What is the bin size you are using? I think you can try to reduce bin size to avoid two negatively correlated peaks located in the same bin, or, you can do both: narrow down the analysis to the gene body and chop them into bins to see what will happen

ADD REPLYlink written 10 months ago by yztxwd380

bin size 200 bp. I previously tried narrowing down to only gene bodies (see above), which did not help. I have now also tried to narrow down the analysis only to promoters, but the result is the same as before.

ADD REPLYlink modified 10 months ago • written 10 months ago by biostart350

Have you plotted the scatter plot to see the correlation (each axis represents one sample)? And which correlation method are you using? A method like Pearson correlation is easy to be driven by outliers

ADD REPLYlink written 10 months ago by yztxwd380
gravatar for Prakash
10 months ago by
Prakash2.0k wrote:

The proportion of regions what you a saying as anti-correlated might be very less compared to positively correlated regions. The correlation value will only indicate how two ChIPseq sample are correlated. What is the correlation value you are getting? If it is not more than 0.5 or 0.6 there are high chances the two sample is not correlated and you may find differential sites. In this case you may look for deferentially bound sites.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Prakash2.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2034 users visited in the last hour