Question: Know if a pathway is being functionally activated or repressed by a group of up-regulated genes
gravatar for salamandra
21 months ago by
salamandra300 wrote:

Imagine we identify a process/pathway that is enriched in up-regulated differentially expressed genes. As the genes included in a pathway can be both activators or repressors of that pathway, we don't know if the differentially expressed up-regulated genes are in fact activating or repressing that pathway. Is there a easy automatic way of building a geneset with just the activating genes of a pathway, besides searching in the literature for each gene individually?

I know a text mining tool that finds positive or negative interactions between a gene and a term, but for each gene reports both positive and negative interactions, so there's no way of telling if the gene is activating or repressed. Or can we assume that if a gene has more positive than neg interactions with the term is because is activating it. What is the common practice here?


ADD COMMENTlink modified 21 months ago by Kevin Blighe60k • written 21 months ago by salamandra300

ADD REPLYlink written 21 months ago by Za120

That tool tells which processes are defenitly NOT associated with a gene set (depleted option), it does not tell if geneset is mainly activating or repressing a process which what I asked. But thank you for the input anyway

ADD REPLYlink written 21 months ago by salamandra300
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

GSVA can show this. It will look at your differential expression results and then infer from this whether certain pathways / processes are statistically significantly up- or down-regulated in your samples. Getting GSVA working can be cumbersome, though. The vignette helped me as a starting point: GSVA: The Gene Set Variation Analysis package for microarray and RNA-seq data

With GSVA, the general process is:

  1. Enrich gene list against a GSVA dataset to obtain matrix of samples versus enrichment terms
  2. Perform limma on enrichment matrix, comparing your samples' conditions of interest
  3. Plot statistically significant terms from enrichment matrix in heatmap

Here is a working example that I did using the C2 gene sets:

The starting point is just rlog counts and a results object from DESeq2 or whatever else you've used.

With GSVA, you load whatever datasets against which you want to perform enrichment.

#Perform GSVA analysis


data(c2BroadSets) #

#only include KEGG pathways
#kegg <- c2BroadSets[which(names(c2BroadSets) %in% names(c2BroadSets)[grep("KEGG_", names(c2BroadSets))])]

#Save rlog counts as new object
df <- assay(rld)

#Filter out genes that pass 5% FDR and absolute log2FC > 2
topTable <-
sigGeneList <- subset(topTable, abs(log2FoldChange)>=2 & padj<=0.05)[,1]
topMatrix <- df[which(rownames(df) %in% sigGeneList),]

#Convert the HGNC names to Entrez IDs (eliminate non-matches where necessary)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annots <- getBM(mart=mart,
  attributes=c("hgnc_symbol", "entrezgene"),
annots <- annots[!duplicated(annots[,1]),]
topMatrix <- topMatrix[which(rownames(topMatrix) %in% annots[,1]),]
annots <- annots[which(annots[,1] %in% rownames(topMatrix)),]
topMatrix <- topMatrix[match(annots[,1], rownames(topMatrix)),]
rownames(topMatrix) <- annots[,2]

#Perform GSVA   
topMatrixGSVA <- gsva(topMatrix,

design <- model.matrix(~ factor(metadata$condition, levels=c("case", "control")))
colnames(design) <- c("case", "control")
fit <- lmFit(topMatrixGSVA, design)
fit <- eBayes(fit)
sigPathways <- topTable(fit, coef="condition.caseVscontrol", number=Inf, p.value=0.05, adjust="BH")
sigPathways <- sigPathways[abs(sigPathways$logFC)>1,]

#Filter the GSVA object to only include significant pathways
topMatrixGSVA <- topMatrixGSVA[rownames(sigPathways),]

#Set colour
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-1.5, 1.5, length.out=101)
heat <- t(scale(t(topMatrixGSVA)))

par(mar=c(2,2,2,2), cex=0.8)

  key.xlab="Enrichment Z-score",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"),


Another example here:


ADD COMMENTlink modified 16 months ago • written 21 months ago by Kevin Blighe60k

Thank very much. This is really useful

ADD REPLYlink written 21 months ago by salamandra300

Sorry Kevin,

For this tutorial you have used c2BroadSets , can I use that for any analysis? For instance, if I am interested in response to chemotherapy in a cancer in responder patients to non-responders?

ADD REPLYlink written 16 months ago by A3.8k

It will depend on the 'gene sets' that are included in C2. Yo can search for key terns, here:

After you search, the results appear at the bottom of the page.

Two gene sets that I have immediately found from keyword chemotherapy are:

ADD REPLYlink written 16 months ago by Kevin Blighe60k

Thank you

I have done with same c2BroadSets

but seems to me these pathways all are meaningless in terms of cancer

> head(sigPathways)
                                              logFC    AveExpr         t    P.Value adj.P.Val         B
SAMOLS_TARGETS_OF_KHSV_MIRNAS_UP          0.5691057  0.5691057  2.482072 0.03441202 0.6950803 -4.091589
REACTOME_PHOSPHORYLATION_OF_THE_APC       0.4853805  0.4853805  2.429304 0.03754903 0.6950803 -4.110608
NIKOLSKY_BREAST_CANCER_16P13_AMPLICON    -0.4532520 -0.4532520 -2.375375 0.04104812 0.6950803 -4.130214
MOREIRA_RESPONSE_TO_TSA_UP               -0.4247967 -0.4247967 -2.358448 0.04221144 0.6950803 -4.136400
SENGUPTA_NASOPHARYNGEAL_CARCINOMA_DN     -0.5000000 -0.5000000 -2.350639 0.04275902 0.6950803 -4.139260
SHARMA_PILOCYTIC_ASTROCYTOMA_LOCATION_UP -0.5000000 -0.5000000 -2.350639 0.04275902 0.6950803 -4.139260
ADD REPLYlink written 16 months ago by A3.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1054 users visited in the last hour