SVA vs limma model
1
0
Entering edit mode
3 months ago
rk.khayami94 ▴ 10

I'm trying to remove batch effect from my data using the sva package. The process described here is like this:

# The null model contains only the adjustment variables. Since we are not adjusting
# for any other variables in this analysis, only an intercept is included in
# the model.
mod0 = model.matrix(~1,data=pheno)

# The full model includes both the adjustment variables
# and the variable of interest
mod <- model.matrix(~1+ cancer, data=pheno)

# Identify the number of latent factors that need to be estimated.
n.sv = num.sv(edata,mod,method="leek")

# estimate the surrogate variables
svobj = sva(edata,mod,mod0,n.sv=n.sv)

# include the surrogate variables in both the null and full models
modSv = cbind(mod,svobj$sv) mod0Sv = cbind(mod0,svobj$sv)

# Adjusting for surrogate variables using the limma package
fit = lmFit(edata,modSv)

Now I want to have a one vs all comparison. So, according to this post by Michael Love my model should be like this:

limma_mod <- model.matrix(~0 + cancer, data=pheno)

I'm confused whether I should use model.matrix(~1+ cancer, data=pheno) as my full model and model.matrix(~1, data=pheno) as the null model, then, for limma, merge model.matrix(~0+ cancer, data=pheno) with the surrogate variables, or use model.matrix(~0+ cancer, data=pheno) as my full model and model.matrix(~1, data=pheno) as the null model?

Note: using mod0 <- model.matrix(~0, data=pheno) and mod <- model.matrix(~0+ cancer, data=pheno) results in this error:

Error in solve.default(t(mod0) %*% mod0) : 'a' is 0-diml
r combat sva microaray batch-effect • 725 views
0
Entering edit mode

I don't really see how your post title relates to the body of your post, but as described here if you have known batch effects you want to use either ComBat (maybe to get CPM values for heatmaps) or simply account for them in your limma design matrix (for DE analysis), whereas SVA is useful if you have unknown sources of variation. Do you still wish to use SVA? If so I can dig through some old code and probably help out for real.

0
Entering edit mode

Sorry about the title. I am really confused on this topic. Working with combat is really easier but I actually don't know all sources of variation in my data. I would be grateful if you coul help me.

2
Entering edit mode
3 months ago
bkleiboeker ▴ 320

Ok, line by line:

Your code for generating mod0 and mod looks good but I can see you are working from the sva user guide (which is great), so just to make sure you understand how to adapt this to your real problem: if you know any other covariates or batch information, you should add them in your formulas for both mod0 and mod (explanation in section 2 of sva user guide)

For example, let's say you know batch and sex info for the samples. Then your code should read:

mod0 <- model.matrix(~1+batch+sex, data=pheno)
mod <- model.matrix(~1+cancer+batch+sex, data=pheno)

Everything else looks good up to fitting the linear model with lmFit(). I'm now seeing that your confusion is really just in the limma steps (but I'll keep the above point because your talk of batch makes me think you might be wanting to correct for both batch effects and surrogate variables).

I am almost certain that you do not actually want an "all vs. one" comparison because comparing against the surrogate variables introduced by sva would be meaningless. If, in your example above, you wanted to find genes differentially expressed in cancer samples relative to control samples, you'd simply need to

fit <- lmFit(edata,modSv)
fit <- eBayes(fit)
topTable(fit, coef="cancer")

Although actually the coef won't be exactly "cancer", it'll probably be "cancer" appended with whichever level appears first in your design matrix. Look at colnames(modSv), find the one that starts with "cancer..." and put that into the coef argument in topTable().

The "all vs. one" Bioconductor thread you linked is making a fundamentally different comparison than you want to make here. There, they are looking for genes expressed in only one tissue, so they want to contrast TissueA vs. TissueB and vs. TissueC and vs. TissueD. But for you this wouldn't make sense, because all vs. one would imply contrasting against surrogate variables introduced to the design matrix by sva (look at your object modSv to see what I mean).

0
Entering edit mode

Thank you very much this was really helpful. I was making wrong model.matrixes all this time! About the "all vs. one" problem, my real data has a lot of samples that's why I used the example in the sva manual just to create a minimal, reproducible example. I have three different tissues divided into cancer and normal groups. So I mereged tham and created a new variable that has 6 levels: tissueA_normal, tissueA_cancer, tissueB_normal, tissueB_cancer, tissueC_normal, and tissueC_cancer. This is actually why I want to use "all vs. one" comparison.

1
Entering edit mode

So are you trying to find genes DE between tissueA_normal and tissueA_cancer, then find genes DE between tissueB_normal and tissueB_cancer, etc. or are you trying to find genes DE between tissueA_normal and (all of tissueA_cancer + tissueB_normal + tissueB_cancer + tissueC_normal + tissueC_cancer)?

In the first case you could pretty simply make contrasts with makeContrasts()

contrast.matrix <- makeContrasts(tissueA = tissueA_normal - tissueA_cancer ,levels=modSv)
fit2 <- contrasts.fit(fit, contrasts.matrix)
fit2 <- eBayes(fit2)
topTable(fit2, coef="tissueA")

In the second case, I think you'd want something like

contrast.matrix <- makeContrasts(tissueA = c(1,c(rep(-1/5,5),c(rep(0,n.sv))),levels=modSv)

but you'd have to move the index at which the 1 appears to align with the index in colnames(modSv) of the tissue_condition you are seeking to test. Like they said in the post you linked, it might be helpful to go back and remake your design matrices with intercept 0 instead of intercept 1, but when you do your real analysis you should be able to decide which of these to do by inspecting the real design matrices with and without intercept.

0
Entering edit mode