Question

DNA methylation analysis for paired samples using random effects model

1

Entering edit mode

2.1 years ago

sswang25 ▴ 20

I am very new to bioinformatics. I am currently examining the differences between DNA methylation in blood and synovial tissue. The DNA methylation data was generated using Illumina EPIC array. I have paired data for 50 patients i.e. each patient had both blood and synovial tissue DNA methylation data. I want to do a paired analysis of the differences in DNA methylation between blood and synovium, taking into account the individual patient.

Some people have suggested I used lme4 using a random effects model but I unfortunately have no idea to go about this. Any help would be greatly appreciated.

paired effects methylation DNA analysis lme4 model random • 1.1k views

ADD COMMENT • link updated 8 months ago by hayleyw • 0 • written 2.1 years ago by sswang25 ▴ 20

0

Entering edit mode

To me, this looks to be a common basic design in methylation analysis studies. I believe the methylation profile in blood is NOT dependent on the synovial fluid methylation pattern and indeed the sample groups are not related. So all you need is to follow a basic methylation analysis workflow like this. Also, the Biostars hosts a couple of tutorials which might be helpful for one with not much experience in the methylation field (like this and this).

ADD REPLY • link 2.1 years ago by Hamid Ghaedi 3.2k

0

Entering edit mode

Thanks for the reply. The main concern with the above methylation analysis pipeline is that it does not take into account the fact that the samples are paired I.e. blood and synovial tissue come from the same patient and there are 50 individual patients with both blood and synovial tissue.

Therefore if I follow the standard analysis, blood from all patients will be one group and synovial tissue from all patients will be another group.

The aim of doing a paired analysis is to take into account inflammation in the synovial tissue may also be reflected in the blood in rheumatoid arthritis. Individuals in the study have different levels of inflammation and therefore their dna methylation in both the blood and the synovium will be different.

ADD REPLY • link 2.1 years ago by sswang25 ▴ 20

score 0 · Answer 1 · 2022-03-10

0

Entering edit mode

2.1 years ago

fusion.slope ▴ 250

I understood your problem. You have the patients that each of them should be considered as random effect. In this script:

https://github.com/tAndreani/LMM/blob/master/Script.r

I have addressed exactly this. I had to see which source of variability was affecting the quantification of two tools. My confounders were:

1) The tool used
2) The random effect of the samples

In this way i was able to understand which source of variability better explained the quantification outcome.

I believe by Analogy you have the same problem.

Here is the part in which you fit the linear mixed model:

https://github.com/tAndreani/LMM/blob/master/Script.r#L50

ADD COMMENT • link 2.1 years ago by fusion.slope ▴ 250

0

Entering edit mode

Hi, thank you very much! This is really useful. I have looked through the script, and as I am a complete beginner with R I just wanted to ask a few things.

Tissue type (blood/synovium) related differences in methylation are what I am really interested in and the individual samples are random effects that are confounding the difference.

I have a data frame of DNA methylation values for 600,000 different probes for each sample ( called "m_values").

I also have a separate data frame containing phenotype data for each sample - including tissue type, the ID of the patient that the sample came from, and the patient's response to a specific medication (called "phenotype_data").

For standard methylation analysis call functions can handle two data frames simultaneously

Adapting your script to my situation, I initially thought this would work:

res <- lmer(m_values~(1|tissue_type)+(1|patient_ID),data=phenotype_data)

However I think this code in lmer will only handle one data frame. The first argument "M_values" should be a column in the data frame not an entire data frame on its own.

I don't think its possible to amalgate the phenotype data and the m values. Therefore I am not sure how to proceed.

ADD REPLY • link 2.1 years ago by sswang25 ▴ 20

0

Entering edit mode

You have to use the reshape package "reshape" and "melt" the two matrices together in order to obtain an object that will allow you to apply the function:

starting from:

https://github.com/tAndreani/LMM/blob/master/Script.r#L33

you can see that the first step is to put together the two matrices and after create the columns used to fit the model..

ADD REPLY • link 2.1 years ago by fusion.slope ▴ 250

0

Entering edit mode

Did you ever figure out the code for your question? I am currently facing the same issue.

ADD REPLY • link 8 months ago by hayleyw • 0