Question: SVA Input: Should SVs be calculated on normalized data?
0
gravatar for Kristin Muench
11 months ago by
United States
Kristin Muench410 wrote:

Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
    myData_rlg <- assay(rlog(myData))

Size factor-corrected generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)

Thanks!

#

For reference, my current SVA code is:

# import raw data
    myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                         directory = pathToHTSeq,
                                         design = ~Genotype)

# Make rawCounts variable
rawCounts <-data.frame( counts(myData) ) 


#import needed library
library('sva')

# make a full model matrix
mod  <- model.matrix(~ Genotype, colData(myData))

# make a null model to compare it to
mod0 <- model.matrix(~   1, colData(myData))

# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0) 
print(svseq)

# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
rna-seq sva R • 488 views
ADD COMMENTlink written 11 months ago by Kristin Muench410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1319 users visited in the last hour