I have the following experimental design for an experiment in which we conducted RNA sequencing: 2 treatment groups and 2 batches, but one of the batches is exclusively one treatment.
sample treatment batch 3 control A 4 control A 5 control B 6 control B 7 control B 13 control B 14 control B 15 BD A 16 BD A 17 BD A 18 BD A 18 BD A
It is a poor experimental design, but unfortunately it is the data that I currently must work with. To account for my the potential batch effect of the B batch, I am using the following design formula:
dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData,design = ~ batch + treatment) dds$treatment <- factor(dds$treatment, levels=c("control", "BD")) dds$batch <- factor(dds$batch, levels=c("A", "B")) dds <- DESeq(dds, full=design(dds), reduced = ~ batch)
The results give me many less DE genes than if I simply ignored the batches and only use
~ treatment. This makes sense, because according to PCA and clustering, there is a batch effect in my samples. I've read the DESeq2 manual and many posts, but am not a statistician and would love to hear feedback if the design I'm using here makes sense, with the lack of representation of both treatment groups in the batch I am intended to correct for. Thank you!
I think this design would not really perform the kind of comparison one would expect. Have a look at my answer if you wish!
It produces the exact comparison that one expects, it's just an LRT for the effect of treatment. The only real caveat to any of this is that a batch-specific change in one group will completely throw off the results...but the only way to deal with such things is to either ignore the possibility (what's typically done) or just toss most of the data.
I have such design
I am only interested in knowing genes related to batch (two experiments) ignoring any change due to treatment; Does this give me these genes?
if you want to completely ignore the change due to treatment, then remove it: