Hello, I'm working on GSVA and further analysis using Limma just like the https://bioconductor.org/packages/release/bioc/vignettes/GSVA/inst/doc/GSVA.pdf figure 4.
My data is clustered into 3 types, according to the TCGA protocol for LUAD. I followed the description from the GSVA.pdf and the answers from biostars written by Kevin Blighe.
{My R code for analysis}
lung_es <- gsva(exampleSet, gm, method = "gsva", min.sz=10, max.sz=9999, verbose=TRUE, abs.ranking=FALSE, no.bootstraps=0, bootstrap.percent = .632)
adjPvalueCutoff <- 0.05 #P-value
logFCcutoff <- log2(2)
design <- model.matrix(~factor(lung_es$TCGA_subtype))
colnames(design) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
fit <- lmFit(lung_es, design)
fit <- eBayes(fit)
allGeneSets <- topTable(fit, coef=NULL, number=Inf)
DEgeneSets <- topTable(fit, coef=NULL, number=Inf, p.value=adjPvalueCutoff, adjust="BH")
res <- decideTests(fit, p.value=adjPvalueCutoff)
summary(res)
# TRU_Bron PI_Squam PP_Magn
#Down 8 168 209
#NotSig 1143 834 793
#Up 13 162 162
sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))
lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]
so, this was the answer and I followed without question... turns out the example (leukemia) has two subtype (ALL, MLL)
BUT, I have three subtype. So i think that code getting sig.genes should be different.
Any suggestion about getting differently activated or deactivated pathways among three subtypes??
Thank you for the quick and kind answer... By 'repeat code 3 times', do you mean that I'll have to repeat the code
to
or are you suggesting I should repeat from the 'lmFit'?
I was more thinking about creating the model matrix with the 3 levels/conditions and then doing different contrasts, as one would normally do in Limma via
makeContrasts()
. Are you very familiar with the limma pipeline?Thank you again for the answer.. I'm not familiar with the limma pipeline,
but I found makeContrasts() and other codes in Limma users guideline's 9.3 Several Groups. It explains how to compare three groups.
the guide said 'The statistic fit2$F and the corresponding fit2$F.p.value combine the three pair-wise comparisons into one F-test.'
does this mean that if I want to find out genes or pathways vary between three subtypes, I should look into fit2$F.p.value??
Yes, you have done it correctly in that example. You then have multiple options when you want to derive the statistics:
Both of these require the use of coefficients, which are specified in contrasts.fit I believe. There was actually a similar question posed on BIoconductor, where one of the key developers of Limma gave a fairly good response (and suggests to look at the manual!):
I would look myself but I am pressed for time.
Obviously, once you get your genes, these are then passed to
gsva()
for enrichment