I am attempting to correct for an apparent batch effect in some raw count data by incorporating the batch effect as part of the model for DESeq2. colData for the data looks like this:
In this data, there is an obvious month-based batch effect - for example, the 'January' group of samples has higher counts across the board, for all the three individuals (A,B and C) across practically all genes, when compared with the other 'before treatment' samples (months March and May).
Since I want to run DESeq2 to test for differential expression observed after administering treatment, I thought that incorporating this batch effect into the DESeq2 design formula should be the best approach (at least, that seems to be the consensus vis a vis the addressing of batch effects). However, this results in the design matrix not being full rank. When I only choose to incorporate "individual" as an additional factor in the design equation, this works without error, but I do know for a fact that, apart from an individual-based batch effect, there is a definite month-based batch effect.
Given the above, what would you suggest as a good design formula for the experiment? Hopefully one that allows the incorporation of month as a batch effect? Thanks!
show us the code you used to create your
dds
, i.e. what was the design formula?