Tutorial: Clustering of DAVID gene enrichment results from gene expression studies
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe14k
London / Brazil
Kevin Blighe14k wrote:

Added in response to another question: Ideas to plot several enrichment results (BP) in one


To take this to completion, you will require:

  • normalised count values (here as exprsMatrix)
  • differential expression results (here as topTable, generated from DESeq2 in this case)
  • Results from DAVID (here exported as MyEnrichment.tsv from DAVID)


#differential expression results
       Gene    baseMean log2FoldChange   lfcSE       stat       pvalue
3098  HOXD4  141.097337      -4.386331 0.7718557  -5.682839 1.324773e-08
5539   IRX2  610.363722       8.930738 0.8609911  10.372625 3.303232e-25
5541   IRX1  406.924285       8.573095 1.0976594   7.810341 5.703335e-15
5686  FGF10   33.978556       7.482399 1.0559976   7.085622 1.384214e-12
7916 CHCHD2 2879.127421      -8.398716 0.6070995 -13.834167 1.585619e-43
9130   ESX1    9.426912      -5.839058 1.0965897  -5.324743 1.010960e-07
3098 2.454747e-06
5539 7.820952e-22
5541 3.472354e-12
5686 5.462264e-10
7916 3.378796e-39
9130 1.485693e-05

#normalised expression values
exprsMatrix[1:5, 1:4]
                  Cell1        Cell2        Cell3          Cell4
WASH7P            9.619592     9.592736     9.591155       8.254291
MIR6859-1         3.626835     2.400240     2.642729       3.148078
RP11-34P13.15     5.433465     5.516004     5.158699       5.225395
RP11-34P13.16     5.194765     4.376780     4.030150       4.700400
RP11-34P13.13     3.032550     3.096567     3.325515       3.544169

Filter out genes of interest

#Filter genes that pass absolute log (base2) fold change >= 4 and FDR (BH) Q<=0.0001
log2FCcutoff <- 4
BHcutoff <- 0.0001
sigGeneList <- subset(topTable, abs(log2FoldChange)>=log2FCcutoff & padj<=BHcutoff)[,1]
topMatrix <- rldMatrix[which(rownames(exprsMatrix) %in% sigGeneList),]

Perform enrichment in DAVID with 'sigGeneList' object

Go to DAVID's Functional Annotation tool and then:

  1. Upload a list of genes to enrich
  2. Select the identifier that you're using for your genes
  3. Select 'Gene List'
  4. Select the repositories against which you wish to perform enrichment (e.g. GO BP, MF, and CC)
  5. Select Functional Annotation Chart

images upload

A new window will be created. From there, click on Download File to export the enrichment results as MyEnrichment.tsv


#Add a panel at the side that shows gene set enrichment analysis results
DAVIDfile <- "MyEnrichment.tsv", sep="")
DAVID <- read.table(DAVIDfile, sep="\t", header=TRUE)
 [1] "Category"        "Term"            "Count"           "X."             
 [5] "PValue"          "Genes"           "List.Total"      "Pop.Hits"       
 [9] "Pop.Total"       "Fold.Enrichment" "Bonferroni"      "Benjamini"      
[13] "FDR"


Create a gene-by-annotation matrix

#Enrichment cut-off
enrichBcutoff <- 0.05
DAVID <- subset(DAVID, Benjamini<enrichBcutoff)
DAVID <- DAVID[,c(1,2,3,6)]

#Pull out only terms with WNT genes
#DAVID <- DAVID[grep("WNT", DAVID[,4]),]

#Create a new dataframe that has '1' for when the gene is part of a term, and '0' when not
annGSEA <- data.frame(row.names=rownames(topMatrix))
for (j in 1:length(rownames(topMatrix)))
    #Create a matching pattern to ensure genes match exactly
    #   ^GENE,  --> Match at beginning of matching string
    #   , GENE$ --> Match at end of matching string
    #    GENE,  --> Match between first and last gene in matching string
    gene <- rownames(topMatrix)[j]
    pattern <- paste("^", gene, ", |, ", gene, "$| ", gene, ",", sep="")
    for (k in 1:nrow(DAVID))
        if (any(grepl(pattern, DAVID$Genes[k])))
            annGSEA[j,k] <- 1
            annGSEA[j,k] <- 0
colnames(annGSEA) <- DAVID[,2]

#Remove terms with no overlapping genes
annGSEA <- annGSEA[,apply(annGSEA, 2, mean)!=0]

#Remove genes with no overlapping terms
annGSEA <- annGSEA[apply(annGSEA, 1, mean)!=0,]


Generate heatmap and annotation


#Match the order of rownames in toptable with that of annGSEA
topTable <- topTable[which(topTable$Gene %in% rownames(annGSEA)),]
topTable <- topTable[match(rownames(annGSEA), topTable$Gene),]

#Set text and figure dimensions

#Create heatmap annotations
    #Colour bar for -log (base 10) FDR Q value for DEGs, and fold changes
    dfMinusLog10FDRGenes <- data.frame(-log10(topTable[which(topTable[,1] %in% rownames(annGSEA)),"padj"]))
    dfMinusLog10FDRGenes[dfMinusLog10FDRGenes=="Inf"] <- 0
    dfFoldChangeGenes <- data.frame(topTable[which(topTable[,1] %in% rownames(annGSEA)),"log2FoldChange"])
    dfGeneAnno <- data.frame(dfMinusLog10FDRGenes, dfFoldChangeGenes)
    colnames(dfGeneAnno) <- c("DEG\nsignificance\nscore", "Regulation")
    dfGeneAnno[,2] <- ifelse(dfGeneAnno[,2]>0, "Up-regulated", "Down-regulated")
    colours <- list("Regulation"=c("Up-regulated"="royalblue", "Down-regulated"="yellow"))
    haGenes <- rowAnnotation(df=dfGeneAnno, col=colours, width=unit(1,"cm"))

    #Colour bar for -log (base 10) Benjamini enrichment Q value
    dfMinusLog10BenjaminiTerms <- data.frame(-log10(read.table(DAVIDfile, sep="\t", header=TRUE)[which(read.table(DAVIDfile, sep="\t", header=TRUE)$Term %in% colnames(annGSEA)),"Benjamini"]))
    colnames(dfMinusLog10BenjaminiTerms) <- "GO Term\nsignificance\nscore"
    haTerms <- HeatmapAnnotation(df=dfMinusLog10BenjaminiTerms,
        colname=anno_text(colnames(annGSEA), rot=40, just="right", offset=unit(1,"npc")-unit(2,"mm"), gp=gpar(fontsize=termLab)),
        annotation_height=unit.c(unit(1, "cm"), unit(8, "cm")))

pdf("GO.pdf", width=7, height=6)
    hmapGSEA <- Heatmap(annGSEA,

        name="My enrichment",


        col=c("0"="white", "1"="forestgreen"),


        row_title="Statistically-significant genes",
        row_title_gp=gpar(fontsize=12, fontface="bold"),
        row_names_gp=gpar(fontsize=geneLab, fontface="bold"),
        row_names_max_width=unit(15, "cm"),

        column_title="Enriched terms",
        column_title_gp=gpar(fontsize=12, fontface="bold"),
        #column_names_gp=gpar(fontsize=termLab, fontface="bold"),
        #column_names_max_height=unit(15, "cm"),


        #width=unit(12.5, "cm"),



    draw(hmapGSEA + haGenes, heatmap_legend_side="right", annotation_legend_side="right")
ADD COMMENTlink modified 9 days ago by bikash251010 • written 4 weeks ago by Kevin Blighe14k

How to export in tsv file, when I click on download option it opens a new window with tex.

ADD REPLYlink written 10 days ago by bikash251010

Hello friend. Right-click on the 'Download' link, and then select the option to 'Save As' (or something equivalent in your own language. Let me know if that works.

ADD REPLYlink modified 10 days ago • written 10 days ago by Kevin Blighe14k

DAVID There is no option for saving as.

ADD REPLYlink written 9 days ago by bikash251010

Dear bikash2510, there is indeed a 'Save as'. Please take a look at the screenshot (note that I use my operating system in Portuguese - 'Salvar link como...' means 'Save link as...'. You have to right-click on Download File:


ADD REPLYlink written 9 days ago by Kevin Blighe14k

Error: unexpected ',' in "DAVIDfile <- "MyEnrichment.tsv","

I am getting this error.

ADD REPLYlink written 8 days ago by bikash251010

It may be a CSV file, if DAVID has changed recently. Please research the differences between file formats TSV and CSV, and then modify your function in R appropriately.

For example:

  • read.table(..., sep=",") #comma-separated
  • read.table(..., sep="\t") #tab-separated

You can also try read.csv(), a wrapper for read.table()

ADD REPLYlink written 8 days ago by Kevin Blighe14k

Facing error in topmatrix.

dat=read.delim("MyEnrichment.csv", header=TRUE)
 [1] "Category"        "Term"            "Count"           "X."              "PValue"         
 [6] "Genes"           "List.Total"      "Pop.Hits"        "Pop.Total"       "Fold.Enrichment"
[11] "Bonferroni"      "Benjamini"       "FDR"            
enrichBcutoff <- 0.05
dat <- subset(dat, Benjamini<enrichBcutoff)
dat <- dat[,c(1,2,3,6)]
annGSEA <- data.frame(row.names=rownames(topMatrix))
Error in rownames(topMatrix) : object 'topMatrix' not found
for (j in 1:length(rownames(topMatrix)))
ADD REPLYlink modified 6 days ago by Kevin Blighe14k • written 6 days ago by kalyanimeha0

What is the name of your object that contains P values an fold changes?

ADD REPLYlink written 6 days ago by Kevin Blighe14k

I didn't get your question, ("P-values" of Gene Entrez ID.) I am writing the same command that you have used, but getting some errror.

ADD REPLYlink written 4 days ago by kalyanimeha0

Oh, sorry, if you look at the very top of this page, you will see a section that mentions the additional data that you need:

  • normalised count values (here as exprsMatrix)
  • differential expression results (here as topTable, generated from DESeq2 in this case)
  • Results from DAVID (here exported as MyEnrichment.tsv from DAVID)

This plot is piecing together various parts of different data from different sources

ADD REPLYlink written 4 days ago by Kevin Blighe14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1339 users visited in the last hour