Dear all,

I am trying to perform methylation-gene expression correlation analysis in R.

After removing probes not mapped to any gene symbol, around 300000 probes remained.

Considering the large number of samples (280), the correlation analysis generates a huge adjacency matrix that it is impossible to be opened in excel.

Since there are several probes corresponding to each gene in methylation data, I have to perform all against all pairwise correlations and then filter out the results.

I searched for a way to filter out significant negative correlations, however after a couple of hours it is still running. I have already increased the memory of R using memory.limit() function.

Is there any way to do this task in my laptop (with 16G Ram) ? (I do not have any access to computer server right now)

I would appreciate any help

Nazanin

Hi Kevin, I could solve the problem in R a few weeks ago, however I could find only one significant negative correlation! It is strange, isn't it?

Yes, it would seem strange, as methylation is supposed to decrease gene expression (?)

I used log transformed form of htseq-count data for gene expression and B values of methylation for corresponding genes for pearson correlation in R. Then I used <= - 0.5 for selecting significant negative correlations. When previously I had compared deregulated expressed genes and demethylated genes, I found a few genes with inverse negative correlation (up-regulated-hypo methylated or down-regulated-hyper methylated), but I could find any of those genes in the correlation analysis!

Maybe encode the methylation as binary (

`methylated`

|`not methylated`

) and then do binary logistic regression with each gene?You normalised the HTseq data, correct?

I am not familiar with this, but try to find a way to do binary logistic regression