Question: SVA Input: Should SVs be calculated on normalized data?
1
gravatar for Kristin Muench
2.2 years ago by
United States
Kristin Muench500 wrote:

Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
    myData_rlg <- assay(rlog(myData))

Size factor-corrected generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)

Thanks!

#

For reference, my current SVA code is:

# import raw data
    myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                         directory = pathToHTSeq,
                                         design = ~Genotype)

# Make rawCounts variable
rawCounts <-data.frame( counts(myData) ) 


#import needed library
library('sva')

# make a full model matrix
mod  <- model.matrix(~ Genotype, colData(myData))

# make a null model to compare it to
mod0 <- model.matrix(~   1, colData(myData))

# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0) 
print(svseq)

# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
rna-seq sva R • 1.1k views
ADD COMMENTlink written 2.2 years ago by Kristin Muench500

I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)

ADD REPLYlink written 11 months ago by lykosherdshep90

Great! Thank you very much for your update.

ADD REPLYlink written 11 months ago by Kristin Muench500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour