Question

SVA Input: Should SVs be calculated on normalized data?

1

Entering edit mode

7.2 years ago

Kristin Muench ▴ 640

Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
myData_rlg <- assay(rlog(myData))

Size factor-corrected generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)

Thanks!

For reference, my current SVA code is:

# import raw data
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                         directory = pathToHTSeq,
                                         design = ~Genotype)

# Make rawCounts variable
rawCounts <-data.frame( counts(myData) ) 


#import needed library
library('sva')

# make a full model matrix
mod  <- model.matrix(~ Genotype, colData(myData))

# make a null model to compare it to
mod0 <- model.matrix(~   1, colData(myData))

# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0) 
print(svseq)

# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)

R SVA RNA-Seq • 4.2k views

ADD COMMENT • link updated 18 months ago by Ram 45k • written 7.2 years ago by Kristin Muench ▴ 640

0

Entering edit mode

I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)

ADD REPLY • link 5.9 years ago by Yun ▴ 250

0

Entering edit mode

Great! Thank you very much for your update.

ADD REPLY • link 5.9 years ago by Kristin Muench ▴ 640

Ram · Answer 1 · 2023-12-19

0

Entering edit mode

18 months ago

Garrett • 0

Hi,

I was interested in comparing PCA plots from “non-SV’d” data to data to which I had applied SVA.

For this, I had needed to apply rlog() (this is what we use in our pipeline).

It is important to note that sva::svaseq() is nearly identical to sva::sva(), with the exception of a log transform on the input data.

Because of this, I just used sva::sva() on rlog’d data (that I had filtered to non-negative data)

I can add code later today.

ADD COMMENT • link updated 18 months ago by Ram 45k • written 18 months ago by Garrett • 0

0

Entering edit mode

Did you notice that you're answering a 6 year old post? I ask because you do not acknowledge that or state how your answer addresses the actual question.

Follow up: Did you even intend on your post being an answer or do you just have a question related to the top level post?

ADD REPLY • link 18 months ago by Ram 45k