I have set a model where I look to different conditions adjusting on a potential batch effect.(see here PCA plot ) To confirm that I tried to use SVA package.
I 'm doing as follows and then the plot looks like this . emt in the model is batch . (see here)
dds <- DESeqDataSetFromMatrix(countData=countdata,colData=sampleTable, design =~ emt + condition) dds <- dds[rowSums(counts(dds)) > 1,] dds <- DESeq(dds) sizeFactors(dds) dat <- counts(dds, normalized=TRUE) idx <- rowMeans(dat) > 1 dat <- dat[idx,] mod <- model.matrix(~ emt + condition, colData(dds)) mod0 <- model.matrix(~ emt, colData(dds)) # To see how many surrogate I have n.sv = num.sv(dat,mod,method="leek") # plot 2 surrogate variables printn.sv) svseq <- svaseq(dat, mod, mod0, n.sv=2)
I 'm not sure to understand what i see on the plot. SV1 is relative to me to the batch effect ( *_2015 vs others) SV2 is relative to the condition, my variable of interest. (For info, Mant and T6 are cells a long time after treatment , Unt & T0 are controls treated cells, T1 are cells early after treatment, so yes contion is time relative)
Am I right ?
But I was thinking that it should have shown others sources of variation, other than the ones I set in my model . What do you think ?
UPDATE : If i set ~1 in the mod0 model , i have this plot . SV1 doesn't change but SV2 clearly separate conditions with a lot of sense (here)
mod <- model.matrix(~ emt + condition, colData(dds)) mod0 <- model.matrix(~ 1, colData(dds))