Question: SVA Input: Should SVs be calculated on normalized data?
1
Kristin Muench540 wrote:

Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

``````myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
myData_rlg <- assay(rlog(myData))
``````

Size factor-corrected generated by:

``````myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)
``````

Thanks!

# #

For reference, my current SVA code is:

``````# import raw data
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)

# Make rawCounts variable
rawCounts <-data.frame( counts(myData) )

#import needed library
library('sva')

# make a full model matrix
mod  <- model.matrix(~ Genotype, colData(myData))

# make a null model to compare it to
mod0 <- model.matrix(~   1, colData(myData))

# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0)
print(svseq)

# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
``````
rna-seq sva R • 1.3k views
written 2.5 years ago by Kristin Muench540

I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)