Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the *raw* data, but actually calculate SVs on *normalized* data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

```
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
myData_rlg <- assay(rlog(myData))
```

Size factor-corrected generated by:

```
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)
```

Thanks!

For reference, my current SVA code is:

```
# import raw data
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
# Make rawCounts variable
rawCounts <-data.frame( counts(myData) )
#import needed library
library('sva')
# make a full model matrix
mod <- model.matrix(~ Genotype, colData(myData))
# make a null model to compare it to
mod0 <- model.matrix(~ 1, colData(myData))
# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0)
print(svseq)
# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
```

I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)

Great! Thank you very much for your update.