Question: How to do correlation analyses between RNAseq and ChIPseq library?
gravatar for Zee_S
23 months ago by
Zee_S50 wrote:

Hello Biostars community,

I would be grateful to get your ideas on a pipeline to correlate the reads of a RNAseq dataset to chipset dataset? I have read many posts where one can calculate the counts per million (CPM) per genomic bin and correlate the two. But I have two questions regarding this approach:

1) How do you account for the large number of 0 reads bins in the RNA-seq sample, arising from the fact that RNA seq enrichment is restricted to only expressed regions of the genome? These zero bins will be included in the correlation calculation.

2) RNA-seq profile is only an enrichment profile (ie. the coverage profile is always >=0). whereas a chipSeq profile is both enrichment as well as depletion. How do you take this factor into account to compare the two and compute the correlation?

Many thanks in advance for your help and suggestions!

ADD COMMENTlink modified 23 months ago by Hussain Ather940 • written 23 months ago by Zee_S50

Good point, often zeros are (optionally) excluded by the tool which then creates a matrix and performs the correlation, eg. deeptools.

ADD REPLYlink written 23 months ago by colindaven2.1k

yes, and if you get rid of those zero reads bins, you are essentially getting rid of any potential negative correlation with your chip seq data that you were looking for in the first place. or am I wrong on this point?

ADD REPLYlink modified 23 months ago • written 23 months ago by Zee_S50
gravatar for Hussain Ather
23 months ago by
Hussain Ather940
National Institutes of Health, Bethesda, MD
Hussain Ather940 wrote:

1) You could try normalizing the bins with respect to some larger value.

2) You could try taking the absolute value of ChIP-Seq profile to account for enrichment and depletion.

ADD COMMENTlink written 23 months ago by Hussain Ather940

Hi Hussain,

Thanks a lot for your reply. Could you kindly elaborate on the two approaches you mention above?

(1) a larger value such as? And how does this solve the issue of zero reads bins in rnaseq dataset being included in the correlation? (2) what do you mean by absolute value?

Thank you

ADD REPLYlink written 23 months ago by Zee_S50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1932 users visited in the last hour