Question

GSVA analysis and limma feature

1

Entering edit mode

6.1 years ago

derricca0708 ▴ 10

Hello, I'm working on GSVA and further analysis using Limma just like the https://bioconductor.org/packages/release/bioc/vignettes/GSVA/inst/doc/GSVA.pdf figure 4.

My data is clustered into 3 types, according to the TCGA protocol for LUAD. I followed the description from the GSVA.pdf and the answers from biostars written by Kevin Blighe.

{My R code for analysis}

lung_es <- gsva(exampleSet, gm, method = "gsva", min.sz=10, max.sz=9999, verbose=TRUE, abs.ranking=FALSE, no.bootstraps=0, bootstrap.percent = .632)
adjPvalueCutoff <- 0.05  #P-value 
logFCcutoff <- log2(2)
design <- model.matrix(~factor(lung_es$TCGA_subtype))
colnames(design) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
fit <- lmFit(lung_es, design)
fit <- eBayes(fit)
allGeneSets <- topTable(fit, coef=NULL, number=Inf)
DEgeneSets <- topTable(fit, coef=NULL, number=Inf, p.value=adjPvalueCutoff, adjust="BH")
res <- decideTests(fit, p.value=adjPvalueCutoff)
summary(res)
#          TRU_Bron PI_Squam PP_Magn
#Down          8      168     209
#NotSig     1143      834     793
#Up           13      162     162
sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))
lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]

so, this was the answer and I followed without question... turns out the example (leukemia) has two subtype (ALL, MLL)

BUT, I have three subtype. So i think that code getting sig.genes should be different.

Any suggestion about getting differently activated or deactivated pathways among three subtypes??

R gene • 4.7k views

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 6.1 years ago by derricca0708 ▴ 10

Kevin Blighe · Answer 1 · 2018-03-27

0

Entering edit mode

6.1 years ago

Kevin Blighe 87k

Hello, I see my name mentioned in your question, and you must be referring to this answer that I gave: GSVA enrichment score and heatmap

If you have 3 conditions, then I would repeat the code 3 times in order to cover all possible pairwise comparisons. Does that make sense?

ADD COMMENT • link 6.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for the quick and kind answer... By 'repeat code 3 times', do you mean that I'll have to repeat the code

sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]

to

sig.genes1 <- c(names(res[res[,1]==1,2]), names(res[res[,1]==-1,2]))

sig.genes2 <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

sig.genes3 <- c(names(res[res[,3]==1,2]), names(res[res[,3]==-1,2]))

or are you suggesting I should repeat from the 'lmFit'?

ADD REPLY • link 6.1 years ago by derricca0708 ▴ 10

0

Entering edit mode

I was more thinking about creating the model matrix with the 3 levels/conditions and then doing different contrasts, as one would normally do in Limma via makeContrasts(). Are you very familiar with the limma pipeline?

ADD REPLY • link 6.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you again for the answer.. I'm not familiar with the limma pipeline,

but I found makeContrasts() and other codes in Limma users guideline's 9.3 Several Groups. It explains how to compare three groups.

> adjPvalueCutoff <- 0.05
> 
> logFCcutoff <- log2(2)
> 
> f <- factor(lung_es$TCGA_subtype, levels = c("TRU_Bron", "PI_Squam", "PP_Magn"))
> design1 <- model.matrix(~0+f)
> colnames(design1) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
> fit <- lmFit(lung_es, design1)
> contrast.matrix <- makeContrasts(PI_Squam-TRU_Bron, PP_Magn-PI_Squam, TRU_Bron-PP_Magn, levels = design1)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> 
> top <- topTableF(fit2, number= 100)

the guide said 'The statistic fit2$F and the corresponding fit2$F.p.value combine the three pair-wise comparisons into one F-test.'

does this mean that if I want to find out genes or pathways vary between three subtypes, I should look into fit2$F.p.value??

ADD REPLY • link updated 6.1 years ago by Kevin Blighe 87k • written 6.1 years ago by derricca0708 ▴ 10

0

Entering edit mode

Yes, you have done it correctly in that example. You then have multiple options when you want to derive the statistics:

Derive separate stats for each pairwise comparison
Derive ANOVA stats comparing all 3 groups

Both of these require the use of coefficients, which are specified in contrasts.fit I believe. There was actually a similar question posed on BIoconductor, where one of the key developers of Limma gave a fairly good response (and suggests to look at the manual!):

Question: Anova; can I use lmFit and eBayes from Limma, or should it be aov?

I would look myself but I am pressed for time.

Obviously, once you get your genes, these are then passed to gsva() for enrichment

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k