Question: GSVA analysis and limma feature
gravatar for derricca0708
6 months ago by
derricca070810 wrote:

Hello, I'm working on GSVA and further analysis using Limma just like the figure 4.

My data is clustered into 3 types, according to the TCGA protocol for LUAD. I followed the description from the GSVA.pdf and the answers from biostars written by Kevin Blighe.

{My R code for analysis}

lung_es <- gsva(exampleSet, gm, method = "gsva",,, verbose=TRUE, abs.ranking=FALSE, no.bootstraps=0, bootstrap.percent = .632)
adjPvalueCutoff <- 0.05  #P-value 
logFCcutoff <- log2(2)
design <- model.matrix(~factor(lung_es$TCGA_subtype))
colnames(design) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
fit <- lmFit(lung_es, design)
fit <- eBayes(fit)
allGeneSets <- topTable(fit, coef=NULL, number=Inf)
DEgeneSets <- topTable(fit, coef=NULL, number=Inf, p.value=adjPvalueCutoff, adjust="BH")
res <- decideTests(fit, p.value=adjPvalueCutoff)
#          TRU_Bron PI_Squam PP_Magn
#Down          8      168     209
#NotSig     1143      834     793
#Up           13      162     162
sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))
lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]

so, this was the answer and I followed without question... turns out the example (leukemia) has two subtype (ALL, MLL)

BUT, I have three subtype. So i think that code getting sig.genes should be different.

Any suggestion about getting differently activated or deactivated pathways among three subtypes??

R gene • 484 views
ADD COMMENTlink modified 8 days ago by RamRS17k • written 6 months ago by derricca070810
gravatar for Kevin Blighe
6 months ago by
Kevin Blighe28k
USA / Europe / Brazil
Kevin Blighe28k wrote:

Hello, I see my name mentioned in your question, and you must be referring to this answer that I gave: GSVA enrichment score and heatmap

If you have 3 conditions, then I would repeat the code 3 times in order to cover all possible pairwise comparisons. Does that make sense?

ADD COMMENTlink written 6 months ago by Kevin Blighe28k

Thank you for the quick and kind answer... By 'repeat code 3 times', do you mean that I'll have to repeat the code

sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]


sig.genes1 <- c(names(res[res[,1]==1,2]), names(res[res[,1]==-1,2]))

sig.genes2 <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

sig.genes3 <- c(names(res[res[,3]==1,2]), names(res[res[,3]==-1,2]))

or are you suggesting I should repeat from the 'lmFit'?

ADD REPLYlink modified 6 months ago • written 6 months ago by derricca070810

I was more thinking about creating the model matrix with the 3 levels/conditions and then doing different contrasts, as one would normally do in Limma via makeContrasts(). Are you very familiar with the limma pipeline?

ADD REPLYlink written 5 months ago by Kevin Blighe28k

Thank you again for the answer.. I'm not familiar with the limma pipeline,

but I found makeContrasts() and other codes in Limma users guideline's 9.3 Several Groups. It explains how to compare three groups.

> adjPvalueCutoff <- 0.05
> logFCcutoff <- log2(2)
> f <- factor(lung_es$TCGA_subtype, levels = c("TRU_Bron", "PI_Squam", "PP_Magn"))
> design1 <- model.matrix(~0+f)
> colnames(design1) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
> fit <- lmFit(lung_es, design1)
> contrast.matrix <- makeContrasts(PI_Squam-TRU_Bron, PP_Magn-PI_Squam, TRU_Bron-PP_Magn, levels = design1)
> fit2 <-, contrast.matrix)
> fit2 <- eBayes(fit2)
> top <- topTableF(fit2, number= 100)

the guide said 'The statistic fit2$F and the corresponding fit2$F.p.value combine the three pair-wise comparisons into one F-test.'

does this mean that if I want to find out genes or pathways vary between three subtypes, I should look into fit2$F.p.value??

ADD REPLYlink modified 5 months ago by Kevin Blighe28k • written 5 months ago by derricca070810

Yes, you have done it correctly in that example. You then have multiple options when you want to derive the statistics:

  1. Derive separate stats for each pairwise comparison
  2. Derive ANOVA stats comparing all 3 groups

Both of these require the use of coefficients, which are specified in I believe. There was actually a similar question posed on BIoconductor, where one of the key developers of Limma gave a fairly good response (and suggests to look at the manual!):

I would look myself but I am pressed for time.

Obviously, once you get your genes, these are then passed to gsva() for enrichment

ADD REPLYlink modified 8 days ago • written 5 months ago by Kevin Blighe28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour