Long time lurker, first time poster.
I was recently given an RNA-seq experimental raw count matrix on which to perform differential gene expression analysis using DESeq2. The experiment involves 3 replicates of a wildtype cell line (WT - Samples 1-3), and 3 replicates of the same cell line within which a single gene was knocked out (KO - Samples 4-6). Total RNA was isolated independently for each of the 6 samples, and independent libraries were prepared and sequenced all on the same lane of the same sequencer.
However, there may be issues with both batch effect and replication in the experiment. While the three wildtype controls are biological replicates, two of the KOs are biological replicates (KO-1 and KO-2 are two different clones originating from the deletion), and one of the KOs is a sort of technical replicate (a separate culture of KO-2 with independent RNA isolation and library prep - not a library replicate or RNA replicate, but a culture replicate). On top of that , KO-1 had RNA collection occur on a separate day than the rest of the samples, thus creating batch effect.
The phenotype data for these samples is listed as below:
Sample Condition Genotype RNA Isolation Date 1 WT WT 12-May 2 WT WT 12-May 3 WT WT 12-May 4 KO KO Clone-1 3-Jun 5 KO KO Clone-2 12-May 6 KO KO Clone-2 12-May
PCA plotting does indeed show clustering of Samples 1-3 together (WT), Samples 5-6 together (KO-2), and Sample 4 (KO-1) as clustering to neither condition.
I am wondering whether not differential gene expression can even be performed comparing KO vs WT, knowing that there is only two biological replicates and one technical replicate for the KO condition? Should I collapse the KO Clone-2 samples together, as they are in a sense poisson distributed technical replicates (same clone, but cultures harvested independently of each other)? I would not feel comfortable considering the replication of Clone-2 as a biological replicate, as it is not representative of the population of KOs.
Additionally, as I understand, batch effect on the RNA isolation date cannot be added on the design matrix, as only one sample is included in a separate batch and is confounded with clone replicate. Is my understanding correct?
Thanks all very much in advance for any insight you can provide me regarding designing the appropriate design matrix for testing KO vs WT differential gene expression.