Iam quite new in the field of bioinformatics and epigenetics and right now iam stucked with my analysis. I want to analysis if we see any correlation between changes of DNA methylation at certain loci (promotor, exons, introns ... ) and the expression of the corresponding gene (unfortunately i dont have the raw file but only log2fold changes) between two cell types.
The first question is should I work with peaks ( i have already peaksets of both MeDip-seq samples which i generated with SICER) or the whole readcount of Input and IP sample for a given gene region (for example -5kb from the TSS)?
If I go for the whole readcount I have to calculate somehow a methylation level. I tried to calculate such value using this formula:
(unique_reads / total_readcount)*region_length
My idea was to use this formula in both IP and Input sample and subtract the input value from the IP value. Unfortunately i got a lot of negative values and therefor its not really possible to calculate a log2 value to correlate with gene expression data. Is there another way to subtract the background noise from my IP sample using the Input control?
Any other ideas how i could correlate this two datasets with each other ?
Thanks for any kind of suggestion.