5.7 years ago

sckinta
730

Hi

I have a set of RNAseq data with unbalanced batch effect (see table below). Batch 1 was made of single indexed kit and sequenced at time 1, while batch 2 was made of dual indexed kit and sequenced independently from batch 1.

```
sample batch groups
1 Naive_Dmt3aKO_rep1 2 Naive_Dmt3aKO
2 Naive_Dmt3aKO_rep2 2 Naive_Dmt3aKO
3 Naive_Dmt3aKO_rep3 2 Naive_Dmt3aKO
4 Naive_WT_rep1 2 Naive_WT
5 Naive_WT_rep2 2 Naive_WT
6 Naive_WT_rep3 2 Naive_WT
7 Th17_Dmt3aKO_rep1 2 Th17_Dmt3aKO
8 Th17_Dmt3aKO_rep2 2 Th17_Dmt3aKO
9 Th17_Dmt3aKO_rep3 1 Th17_Dmt3aKO
10 Th17_WT_rep1 2 Th17_WT
11 Th17_WT_rep2 2 Th17_WT
12 Th17_WT_rep3 1 Th17_WT
13 Th1_Dmt3aKO_rep1 2 Th1_Dmt3aKO
14 Th1_Dmt3aKO_rep2 2 Th1_Dmt3aKO
15 Th1_Dmt3aKO_rep3 1 Th1_Dmt3aKO
16 Th1_WT_rep1 2 Th1_WT
17 Th1_WT_rep2 2 Th1_WT
18 Th1_WT_rep3 1 Th1_WT
19 Th2_Dmt3aKO_rep1 2 Th2_Dmt3aKO
20 Th2_Dmt3aKO_rep2 2 Th2_Dmt3aKO
21 Th2_Dmt3aKO_rep3 1 Th2_Dmt3aKO
22 Th2_WT_rep1 2 Th2_WT
23 Th2_WT_rep2 2 Th2_WT
24 Th2_WT_rep3 1 Th2_WT
```

From my exploratory analysis, I noticed batch 1 samples and batch 2 samples are clustered independently from each other.

Thus, on my DE analysis design, I used batch as covariant to evaluate the batch effect DE.

```
groups <- relevel(groups, ref="Naive_KO")
batch <- relevel(batch,ref="1")
design <- model.matrix(~batch+groups, data=y$samples)
y_filtered <- estimateDisp(y_filtered,design)
fit <- glmQLFit(y_filtered, design, robust=T)
```

I found 24439 genes were differentially expressed btw batch 1 and batch 2.

```
#### batch effect
batch_DE <- glmQLFTest(fit, coef=2)
FDR <- p.adjust(batch_DE$table$PValue, 'fdr')
sum(FDR < 0.05)
# 24439
```

My questions:

- Since Naive has only batch 2 samples no batch 1 samples, Can 24439 batch DE genes be caused by difference between other Th and Naive? In the linear regression, ~batch+groups , we assume batch and group are independent. theoretically, those 24439 genes should be independent of group difference, but this unbalanced design really bothers me.
- Will this unbalance design affect differential analysis between groups? for example, comparing Th1_WT to Naive_WT.

