Question: RNA-seq unbalanced batch effect correction
gravatar for sckinta
11 months ago by
United States
sckinta520 wrote:


i have a set of RNAseq data with unbalanced batch effect (see table below). Batch 1 was made of single indexed kit and sequenced at time 1, while batch 2 was made of dual indexed kit and sequenced independently from batch 1.

    sample  batch   groups
1   Naive_Dmt3aKO_rep1  2   Naive_Dmt3aKO
2   Naive_Dmt3aKO_rep2  2   Naive_Dmt3aKO
3   Naive_Dmt3aKO_rep3  2   Naive_Dmt3aKO
4   Naive_WT_rep1   2   Naive_WT
5   Naive_WT_rep2   2   Naive_WT
6   Naive_WT_rep3   2   Naive_WT
7   Th17_Dmt3aKO_rep1   2   Th17_Dmt3aKO
8   Th17_Dmt3aKO_rep2   2   Th17_Dmt3aKO
9   Th17_Dmt3aKO_rep3   1   Th17_Dmt3aKO
10  Th17_WT_rep1    2   Th17_WT
11  Th17_WT_rep2    2   Th17_WT
12  Th17_WT_rep3    1   Th17_WT
13  Th1_Dmt3aKO_rep1    2   Th1_Dmt3aKO
14  Th1_Dmt3aKO_rep2    2   Th1_Dmt3aKO
15  Th1_Dmt3aKO_rep3    1   Th1_Dmt3aKO
16  Th1_WT_rep1 2   Th1_WT
17  Th1_WT_rep2 2   Th1_WT
18  Th1_WT_rep3 1   Th1_WT
19  Th2_Dmt3aKO_rep1    2   Th2_Dmt3aKO
20  Th2_Dmt3aKO_rep2    2   Th2_Dmt3aKO
21  Th2_Dmt3aKO_rep3    1   Th2_Dmt3aKO
22  Th2_WT_rep1 2   Th2_WT
23  Th2_WT_rep2 2   Th2_WT
24  Th2_WT_rep3 1   Th2_WT

From my exploratory analysis, I noticed batch 1 samples and batch 2 samples are clustered independently from each other.

pairwise correlation heatmap

Thus, on my DE analysis design, I used batch as covariant to evaluate the batch effect DE.

groups <- relevel(groups, ref="Naive_KO")
batch <- relevel(batch,ref="1")
design <- model.matrix(~batch+groups, data=y$samples)
y_filtered <- estimateDisp(y_filtered,design)
fit <- glmQLFit(y_filtered, design, robust=T)

I found 24439 genes were differentially expressed btw batch 1 and batch 2.

#### batch effect 
batch_DE <- glmQLFTest(fit, coef=2)
FDR <- p.adjust(batch_DE$table$PValue, 'fdr')
sum(FDR < 0.05)
# 24439

My questions:

  1. since Naive has only batch 2 samples no batch 1 samples, Can 24439 batch DE genes be caused by difference between other Th and Naive? In the linear regression, ~batch+groups , we assume batch and group are independent. theoretically, those 24439 genes should be independent of group difference, but this unbalanced design really bothers me.

  2. Will this unbalance design affect differential analysis between groups? for example, comparing Th1_WT to Naive_WT.

batch effect edger rna-seq R • 618 views
ADD COMMENTlink modified 11 months ago by Devon Ryan91k • written 11 months ago by sckinta520

How to add images to a Biostars post

ADD REPLYlink written 11 months ago by WouterDeCoster40k
gravatar for Devon Ryan
11 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:
  1. No, that's really just the batch effect, which is quite large. In an ideal world you might look at ~batch*groups, but odds are (A) the interaction isn't biologically reasonable and (B) that will suck up a LOT of your degrees of freedom.
  2. The values from batch 1 aren't going to be adding much to the model fit at this point. You tend to have about 2.75 effective samples per group, which is really low and that's going to be your main issue. Even with 3 samples per group unless the effect size is large you're going to have issues with power. I would be more concerned with that than the knock-on effect of a batch effect.
ADD COMMENTlink written 11 months ago by Devon Ryan91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour