Question

Is it possible to controlling for variables in ChAMP or any methylation package

1

Entering edit mode

7.3 years ago

dacheampong26 ▴ 10

Hello all, I have an epic data from one study that I want to use as my control and the other from another study that I want to use as my treatment. I am using ChAMP to identify differentially methylated probes and would like to control for age and race since my data is from two different studies. Is there a way to do this? Thanks --

rna-seq • 2.4k views

ADD COMMENT • link updated 4.0 years ago by BiostarGuardianAngel ▴ 20 • written 7.3 years ago by dacheampong26 ▴ 10

score 1 · Answer 1 · 2019-11-26

Below I'm using the same method champ.DMP uses for DMP but allowing for adjusting covariates

# load in packages
library(dplyr) 
library(tidyr) 
library(ChAMP) 
library(tibble)
library(ChAMPdata)
library(limma)
data(probe.features.epic)
cpg.info = probe.features %>% rownames_to_column("CpG")

# make sure dplyr functions aren't overwritten
select <- dplyr::select; rename <- dplyr::rename; mutate <- dplyr::mutate; 
summarize <- dplyr::summarize; arrange <- dplyr::arrange; filter <- dplyr::filter; slice <- dplyr::slice

# make fake data for demographics dataset
demographics_data = data.frame(Sample_Name = paste0("Sample",1:20), 
                           Sample_Group = sample(c("Normal","Tumor"),size=20,replace=TRUE),
                           age = round(rnorm(20,45,5),0),
                           race = sample(c("White","African American"),size=20,replace=TRUE))

# make fake data for beta matrix 
myNorm = matrix(sample(seq(0.01,1,by=0.001),20*20,replace=FALSE),ncol=20)
rownames(myNorm) = paste0("cg",sample(0:2000000,size=20,replace=FALSE))
colnames(myNorm) = demographics_data$Sample_Name


# HERE YOU SPECIFY PHENOTYPE AND ADJUSTED COVARIATES. 
# FIRST THING AFTER ~ IS PHENOTYPE TO BE COMPARED AND EVERYTHING AFTER ARE ADJUSTED COVARIATES
# in this example we compare Sample_Group (normal vs. tumor) adjusting for race and age
design=model.matrix(~ Sample_Group + race + age, demographics_data)

fit = lmFit(myNorm, design)
fit.e = eBayes(fit)
IV=colnames(fit$coefficients)[2]

# differentially methylated probes
DMP = topTable(fit.e,coef=IV, adjust.method="BH",sort.by = "P", num=Inf) %>%
          rownames_to_column("CpG")%>%
          left_joincpg.info,by="CpG") # add probe info. fake data doesn't have real cpg names so NA