Dear all,

We have RNA-seq data of 9 High force versus 9 Low force scallop adductors along with known covariates like Height, Length, Width, Weight (I have cut() them )and Strain. So in DESeq2 I could correct for these covariates as follows:

```
#We only chose Height, Weight and Strain as covariates because of their higher Pearson correlation with Force.
dds <- DESeqDataSetFromMatrix( countData = expr, colData = subt, design =~Height+Weight+Strain+Group)
dds <- DESeq(dds)
dat <- counts(dds, normalized = TRUE)
keep <- rowMeans(dat) > 1
dat <- dat[keep, ]
mod <- model.matrix(~Height+Weight+Strain+Group, colData(dds))
mod0 <- model.matrix(~Height+Weight+Strain, colData(dds))
svseq <- svaseq(dat, mod, mod0)
dds$SV1 <- svseq$sv[,1]
dds$SV2 <- svseq$sv[,2]
dds$SV3 <- svseq$sv[,3]
```

My questions are:

1. The design that corrects for both known and unknown surrogate variables, which one is correct and what leads to the difference of DEGs?

(1) gained 850 DEGs

```
design(dds) <-~SV1+SV2+SV3+Height+Weight+Strain+Group
```

or

(2) gained 214 DEGs

```
design(dds) <-~SV1+SV2+SV3+Group
```

2. Which one is a better choice for svaseq?

(1)

```
mod <- model.matrix(~Height+Weight+Strain+Group, colData(dds))
mod0 <- model.matrix(~Height+Weight+Strain, colData(dds))
```

or

(2)

```
mod <- model.matrix(~Group, colData(dds))
mod0 <- model.matrix(~1, colData(dds))
```

3.How should I correct the design if I decide the second mod mod0 for svaseq? just like that?

```
design(dds) <-~SV+Height+Weight+Strain+Group # gained 1187 DEGs
```

4. If I want to remove batch effect for downstream analysis, did I encode right?

```
vsd<-vst(dds,blind = F)
covariates = colData(dds)[,c("SV1",'SV2',"SV3")]
assay(vsd) <- limma::removeBatchEffect(assay(vsd), batch1=subt$Height,batch2=subt$Batch,batch3=subt$Strain,covariates =
covariates,design=model.matrix(~Group, data=subt))
```

Appreciate for all responses and advice.

Best regards,

Crystal571

Can I ask you which you choose at the end? because I have the same question. Thank you