Question: How to do correlation analyses between RNAseq and ChIPseq library?
gravatar for Zee_S
12 months ago by
Zee_S20 wrote:

Hello Biostars community,

I would be grateful to get your ideas on a pipeline to correlate the reads of a RNAseq dataset to chipset dataset? I have read many posts where one can calculate the counts per million (CPM) per genomic bin and correlate the two. But I have two questions regarding this approach:

1) How do you account for the large number of 0 reads bins in the RNA-seq sample, arising from the fact that RNA seq enrichment is restricted to only expressed regions of the genome? These zero bins will be included in the correlation calculation.

2) RNA-seq profile is only an enrichment profile (ie. the coverage profile is always >=0). whereas a chipSeq profile is both enrichment as well as depletion. How do you take this factor into account to compare the two and compute the correlation?

Many thanks in advance for your help and suggestions!

ADD COMMENTlink modified 12 months ago by Hussain Ather910 • written 12 months ago by Zee_S20

Good point, often zeros are (optionally) excluded by the tool which then creates a matrix and performs the correlation, eg. deeptools.

ADD REPLYlink written 12 months ago by colindaven1.2k

yes, and if you get rid of those zero reads bins, you are essentially getting rid of any potential negative correlation with your chip seq data that you were looking for in the first place. or am I wrong on this point?

ADD REPLYlink modified 12 months ago • written 12 months ago by Zee_S20
gravatar for Hussain Ather
12 months ago by
Hussain Ather910
National Institutes of Health, Bethesda, MD
Hussain Ather910 wrote:

1) You could try normalizing the bins with respect to some larger value.

2) You could try taking the absolute value of ChIP-Seq profile to account for enrichment and depletion.

ADD COMMENTlink written 12 months ago by Hussain Ather910

Hi Hussain,

Thanks a lot for your reply. Could you kindly elaborate on the two approaches you mention above?

(1) a larger value such as? And how does this solve the issue of zero reads bins in rnaseq dataset being included in the correlation? (2) what do you mean by absolute value?

Thank you

ADD REPLYlink written 12 months ago by Zee_S20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1478 users visited in the last hour