I really appreciate your time and valuable inputs!
My issues while using Champ Bioconductor package:
Champ norm: I use TCGA data. When I run the champ.norm() on the data, every time I receive slightly different results. Especially, the fluctuation occurs at the third decimal point in all the beta values. I have noticed in the champ.BMIQ.R that a seed has been set to handle stochasticity. For my work, each decimal value plays a CRITICAL role.
For example:
myNorm1 = champ.norm(beta=myLoad$beta, method='BMIQ', arraytype='450K', cores=3, plotBMIQ = F)
myNorm2 = champ.norm(beta=myLoad$beta, method='BMIQ', arraytype='450K', cores=3, plotBMIQ = F)
myNorm1[1:5,1:5]
TCGA_1 TCGA_2 TCGA_3 TCGA_4 TCGA_5
cg00000162 0.2643464 0.5335578 0.1599576 0.5421202 0.2332971
cg00000238 0.9594341 0.9419697 0.9427666 0.9645310 0.9741122
cg00000287 0.7669794 0.6908586 0.6977587 0.7625313 0.6476096
cg00000296 0.7634344 0.7214260 0.7810275 0.5577281 0.8955270
cg00000325 0.3659023 0.5190070 0.3785389 0.5751796 0.3450515
myNorm2[1:5,1:5]
TCGA_1 TCGA_2 TCGA_3 TCGA_4 TCGA_5
cg00000162 0.2660004 0.5389914 0.1662706 0.5479737 0.2283009
cg00000238 0.9588291 0.9421513 0.9434506 0.9641600 0.9721813
cg00000287 0.7656797 0.6941988 0.7018092 0.7685632 0.6427196
cg00000296 0.7621554 0.7243594 0.7847278 0.5635942 0.8907006
cg00000325 0.3669594 0.5246343 0.3839324 0.5810598 0.3400840
Would taking a mean of myNorm1 and myNorm2 be appropriate? I am not sure of the confidence interval values for average beta values when running multiple times.
Champ.combat(): After SVD, I identified, gender and ethnicity as batch effects to be corrected. However, when I combine all three and run combat from Champ package, I get the following error:
Error in while (change > conv) { : missing value where TRUE/FALSE needed.
myCombat = champ.runCombat(beta=myNorm,
pd=myLoad$pd,
batchname =c('gender', 'ethnicity'),
logitTrans = T,
variablename = "DefLabel"
)
Comined factors fails. However, I am able to correct for “Gender alone” using the following:
mod_1<- model.matrix(~DefLabel, data=myLoad$pd)
bat_2<- ComBat(dat=myNorm, myLoad$pd$gender, mod_1, par.prior=TRUE, prior.plots=FALSE)
champ.SVD(bat_2, myLoad$pd))
I have checked for NAs in metadata and Normalized data. All came out to be zero. Hence, I am unable to resolve this issue either.
var.data.col<- (apply(myNorm, 2, var))
var.data.row<- (apply(myNorm, 1, var))
which(rowVars(myNorm) ==0 )
whichis.na(myNorm))
lengthwhichis.na(metadata$ethnicity)) =="TRUE") =0
lengthwhichis.na(metadata$gender)) =="TRUE") =0
lengthwhichis.na(metadata$race)) =="TRUE") =0
sessioninfo():
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3
ChAMP_2.16.2
minfi_1.32.0
Kindly let me know your thoughts on the above issues. Thanks for your time.