Hi,
I'm trying to remove the batch effect from my dataset. I've used the following code:
create deseq object. In the design matrix, I have the patient variable with 'cancer' or 'healthy' samples. And the sex variable with male/M or female/F
dds<-DESeqDataSetFromMatrix(dataset,colData =metadata, design = ~Patientgroup+sex)
dds <- estimateSizeFactors(dds)
dds<-DESeq(dds)
create surrogate variable
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Patient.group+sex, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv = 1)
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
design(ddssva) <- ~ SV1 + Patient.group + sex
remove surrogate batch effect
rld <- vst(ddssva, blind=FALSE)
mat <- assay(rld)
mm <- model.matrix(~Patient.group + sex, colData(rld))
mat <- limma::removeBatchEffect(mat, batch=vsd$SV1, design=mm)
However, the 'mat line' at the end gives me this warning: Coefficients not estimable: batch173. Warning message: Partial NA coefficients for 3741 probe(s)
I believe this is because the 'mm' object looks like this, with patient group column only containing 1's. This is part of the 'mm' dataset:
(Intercept) Patient.groupcancer sexM
sample.001 1 1 0
sample.002 1 1 1
sample.003 1 1 1
sample.004 1 1 0
sample.005 1 1 0
sample.007 1 1 1
sample.008 1 1 0
However, I don't understand why the patient group suddenly only has 1's. Can anyone explain?
I'm still stuck on this problem, can someone help out? I have heard that it can be because my dataset contains NA values but this is not the case, I have checked this. Also, it can be because some variables in the design formula such as sex are not important and have to be removed, however, I am still getting this error message after removal of the sex variable.