I have set a model where I look to different conditions adjusting on a potential batch effect.(see here PCA plot ) To confirm that I tried to use SVA package.
I 'm doing as follows and then the plot looks like this . emt in the model is batch . (see here)
dds <- DESeqDataSetFromMatrix(countData=countdata,colData=sampleTable, design =~ emt + condition)
dds <- dds[rowSums(counts(dds)) > 1,]
dds <- DESeq(dds)
sizeFactors(dds)
dat <- counts(dds, normalized=TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]
mod <- model.matrix(~ emt + condition, colData(dds))
mod0 <- model.matrix(~ emt, colData(dds))
# To see how many surrogate I have
n.sv = num.sv(dat,mod,method="leek")
# plot 2 surrogate variables
printn.sv)
svseq <- svaseq(dat, mod, mod0, n.sv=2)
I 'm not sure to understand what i see on the plot. SV1 is relative to me to the batch effect ( *_2015 vs others) SV2 is relative to the condition, my variable of interest. (For info, Mant and T6 are cells a long time after treatment , Unt & T0 are controls treated cells, T1 are cells early after treatment, so yes contion is time relative)
Am I right ?
But I was thinking that it should have shown others sources of variation, other than the ones I set in my model . What do you think ?
UPDATE : If i set ~1 in the mod0 model , i have this plot . SV1 doesn't change but SV2 clearly separate conditions with a lot of sense (here)
mod <- model.matrix(~ emt + condition, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))