Question: Correlation of two ChIP-seq signals?
1
gravatar for biostart
6 weeks ago by
biostart340
Germany
biostart340 wrote:

Hi all,

I am looking at two continuous ChIP-seq signals (think of histone modifications), and for several randomly selected example regions I see by eye that these signals are anti-correlated (peaks of one signal at locations of deeps of another signal). But when I calculate a proper correlation genome-wide, they are slightly positively correlated.

I tried calculating the correlation both based on bins and based on specified intervals (genes only), and in both cases I get positive correlation, but I see that the browser tracks for examples of my genes of interest are anti-correlated. Any idea how to check genome-wide correlation in some other way?

Thanks!

chip-seq • 187 views
ADD COMMENTlink modified 4 weeks ago by Prakash1.7k • written 6 weeks ago by biostart340

Can you add some representative code and screenshots? With just words it is difficult to reproduce / understand what you see or think to see. By-eye interpretation is probematic. How many regions did you check by eye and and how many regions were included in the calculation, aka is your by-eye analysis representative?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by ATpoint28k

The code is quite standard, e.g. I calculated correlations between BigWig files using deepTools (both bin-based and region-based), and I also calculated it using a custom code, with the same result (positive correlation). The by-eye analysis is definitely not representative, but it's kind of also important, because I looked at three genes that change their expression the most. The lines in by-eye analysis are smoothed (20-bp running window), so I see two smooth anti-correlated profiles which have peaks/deeps, and the typical width of the peak is 1-4 nucleosomes.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by biostart340

I suspect what is happening here is that you are looking at areas of the genome where there is some signal in both tracks, and seeing that within those areas there is a negative corrlation.

However, this will be completely swamped if it turns out that the same areas have some signal in both samples. For example H3K36me (I think) is associated with transcription. If you compare two genes, you might find the pattern within the gene anti-correlated, but since genes are in the same place in both files, you'll find they are positively correlated because of this.

ADD REPLYlink written 6 weeks ago by i.sudbery6.6k

I was also thinking that this may be the case, so I narrowed down the correlation analysis only to gene bodies. As a result I get even stronger positive correlation genome-wide inside gene bodies, which still does not explain the negative correlation that I see by eye in the example genes.

ADD REPLYlink written 6 weeks ago by biostart340

What about exons/introns - is some correlation coming from the signal being stronger in exons?

ADD REPLYlink written 6 weeks ago by i.sudbery6.6k

I agree with @i.sudbery that the anti-correlation you saw could be swamped due to a large area chosen (or gene).

What is the bin size you are using? I think you can try to reduce bin size to avoid two negatively correlated peaks located in the same bin, or, you can do both: narrow down the analysis to the gene body and chop them into bins to see what will happen

ADD REPLYlink written 5 weeks ago by yztxwd290

bin size 200 bp. I previously tried narrowing down to only gene bodies (see above), which did not help. I have now also tried to narrow down the analysis only to promoters, but the result is the same as before.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by biostart340

Have you plotted the scatter plot to see the correlation (each axis represents one sample)? And which correlation method are you using? A method like Pearson correlation is easy to be driven by outliers

ADD REPLYlink written 5 weeks ago by yztxwd290
1
gravatar for Prakash
4 weeks ago by
Prakash1.7k
India
Prakash1.7k wrote:

The proportion of regions what you a saying as anti-correlated might be very less compared to positively correlated regions. The correlation value will only indicate how two ChIPseq sample are correlated. What is the correlation value you are getting? If it is not more than 0.5 or 0.6 there are high chances the two sample is not correlated and you may find differential sites. In this case you may look for deferentially bound sites.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Prakash1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1182 users visited in the last hour