Question: Correlation between methylation's regression values and Gene expression's logFC
gravatar for Bioinformatist Newbie
21 months ago by
Bioinformatist Newbie230 wrote:


I have a dataframe where I have Gene Names, regression estimates (for 5mC methylation data: a positive estimate would indicate hypermethylation, while a negative estimate would indicate hypomethylation in the disease group. These estimates are averaged at gene level, initially I had these values for each CpG site) and logFC computed by limma (positive value means genes are up-regulated in disease, negative values means they are down-regulated in diseased state). This is how my dataframe looks like:

> data[1:3,]
  Gene    Reg_Beta       logFC
1 A1BG 0.012759505 -0.01594659
2 A1CF 0.003407954  0.01044036
3  A2M 0.004816774  0.37067536

Can anybody guide me if I can obtain correlation between Reg_Beta (avg. beta value for methylation status of a gene) and logFC (expression value of that gene) at gene level? So that at the end I can get those genes for which I can say they are highly anti-correlated to gene expression.

I am a newbie to methylation analysis, any constructive suggestion or comment will be highly appreciated! Thanks.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Bioinformatist Newbie230

For a correlation you'll need more than only one data point per group. You have for each gene Group A: Reg_beta (one value) and Group B: logFC (one value). For proper correlation you need a set of points for both A and B.

ADD REPLYlink written 21 months ago by b.nota6.3k

Thank you for your comment. If I consider the original beta values (averaged per gene level) for disease group and similarly for healthy group and then I add log2 normalized expression values for diseased and healthy samples (at gene level) then how would I get what I am looking for. For example say col1 will be gene name, col2:35 are beta value of diseased samples, col 36:70 are beta value of healthy samples, col 71:91 are expression values for diseased sample and col 92:102 are expresion values for healthy samples. Can you guide me how will I design the comparisons in this case so that the results make sense and I get what I am looking for.

ADD REPLYlink written 21 months ago by Bioinformatist Newbie230

For correlation you'll need the same number of values in Group A as in Group B, and they need to be paired, this pairing needs to be meaningful (not random).

I am not sure what your Reg_beat values are, and how they link to your expression values. Are these paired?

If you make a plot with A in x-axis and B in y, then each value needs to be paired and becomes one point. The correlation is then how all these points fit a line.

Hope this helps, if not please read e.g., wiki about correlation.

ADD REPLYlink modified 21 months ago • written 21 months ago by b.nota6.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1258 users visited in the last hour