Question: GSVA analysis and limma feature
1
gravatar for derricca0708
7 months ago by
derricca070810
derricca070810 wrote:

Hello, I'm working on GSVA and further analysis using Limma just like the https://bioconductor.org/packages/release/bioc/vignettes/GSVA/inst/doc/GSVA.pdf figure 4.

My data is clustered into 3 types, according to the TCGA protocol for LUAD. I followed the description from the GSVA.pdf and the answers from biostars written by Kevin Blighe.

{My R code for analysis}

lung_es <- gsva(exampleSet, gm, method = "gsva", min.sz=10, max.sz=9999, verbose=TRUE, abs.ranking=FALSE, no.bootstraps=0, bootstrap.percent = .632)
adjPvalueCutoff <- 0.05  #P-value 
logFCcutoff <- log2(2)
design <- model.matrix(~factor(lung_es$TCGA_subtype))
colnames(design) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
fit <- lmFit(lung_es, design)
fit <- eBayes(fit)
allGeneSets <- topTable(fit, coef=NULL, number=Inf)
DEgeneSets <- topTable(fit, coef=NULL, number=Inf, p.value=adjPvalueCutoff, adjust="BH")
res <- decideTests(fit, p.value=adjPvalueCutoff)
summary(res)
#          TRU_Bron PI_Squam PP_Magn
#Down          8      168     209
#NotSig     1143      834     793
#Up           13      162     162
sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))
lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]

so, this was the answer and I followed without question... turns out the example (leukemia) has two subtype (ALL, MLL)

BUT, I have three subtype. So i think that code getting sig.genes should be different.

Any suggestion about getting differently activated or deactivated pathways among three subtypes??

R gene • 611 views
ADD COMMENTlink modified 8 weeks ago by RamRS19k • written 7 months ago by derricca070810
0
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe32k
Republic of Ireland
Kevin Blighe32k wrote:

Hello, I see my name mentioned in your question, and you must be referring to this answer that I gave: GSVA enrichment score and heatmap

If you have 3 conditions, then I would repeat the code 3 times in order to cover all possible pairwise comparisons. Does that make sense?

ADD COMMENTlink written 7 months ago by Kevin Blighe32k

Thank you for the quick and kind answer... By 'repeat code 3 times', do you mean that I'll have to repeat the code

sig.genes <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

lung_eset.siggenes <- lung_es[which(rownames(lung_es) %in% sig.genes),]

to

sig.genes1 <- c(names(res[res[,1]==1,2]), names(res[res[,1]==-1,2]))

sig.genes2 <- c(names(res[res[,2]==1,2]), names(res[res[,2]==-1,2]))

sig.genes3 <- c(names(res[res[,3]==1,2]), names(res[res[,3]==-1,2]))

or are you suggesting I should repeat from the 'lmFit'?

ADD REPLYlink modified 7 months ago • written 7 months ago by derricca070810

I was more thinking about creating the model matrix with the 3 levels/conditions and then doing different contrasts, as one would normally do in Limma via makeContrasts(). Are you very familiar with the limma pipeline?

ADD REPLYlink written 7 months ago by Kevin Blighe32k

Thank you again for the answer.. I'm not familiar with the limma pipeline,

but I found makeContrasts() and other codes in Limma users guideline's 9.3 Several Groups. It explains how to compare three groups.

> adjPvalueCutoff <- 0.05
> 
> logFCcutoff <- log2(2)
> 
> f <- factor(lung_es$TCGA_subtype, levels = c("TRU_Bron", "PI_Squam", "PP_Magn"))
> design1 <- model.matrix(~0+f)
> colnames(design1) <- c("TRU_Bron", "PI_Squam", "PP_Magn")
> fit <- lmFit(lung_es, design1)
> contrast.matrix <- makeContrasts(PI_Squam-TRU_Bron, PP_Magn-PI_Squam, TRU_Bron-PP_Magn, levels = design1)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> 
> top <- topTableF(fit2, number= 100)

the guide said 'The statistic fit2$F and the corresponding fit2$F.p.value combine the three pair-wise comparisons into one F-test.'

does this mean that if I want to find out genes or pathways vary between three subtypes, I should look into fit2$F.p.value??

ADD REPLYlink modified 7 months ago by Kevin Blighe32k • written 7 months ago by derricca070810

Yes, you have done it correctly in that example. You then have multiple options when you want to derive the statistics:

  1. Derive separate stats for each pairwise comparison
  2. Derive ANOVA stats comparing all 3 groups

Both of these require the use of coefficients, which are specified in contrasts.fit I believe. There was actually a similar question posed on BIoconductor, where one of the key developers of Limma gave a fairly good response (and suggests to look at the manual!):

I would look myself but I am pressed for time.

Obviously, once you get your genes, these are then passed to gsva() for enrichment

ADD REPLYlink modified 8 weeks ago • written 7 months ago by Kevin Blighe32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2209 users visited in the last hour