I wanted to build a heatmap with different sets of genes as columns and GO BP terms as rows. The intensity of the colour of each cell in the heatmap would represent the p-value.
I started by using hyperGTest of package GOstats to calculate p-value for different GO terms. I selected p-value cutoff as 1 cause wanted to get p-values for all terms, even those that were not significant. I did this with the following code for two gene sets (A and B):
A <- c("86", "51412", "192669", "8289", "57492", "83858", "10533", "10538", "54880", "1230", "947", "79577", "55038", "64866", "1017", "1026", "1063", "1105", "91851", "55118")
B <- c("79624", "23192", "85003", "151887", "7555", "9267", "8663", "3692", "51359", "90625", "28982", "2597", "81502", "57470", "4223", "57380", "57552", "54539", "55655", "199713", "79031", "123", "5558", "10635", "81892")
library(org.Hs.eg.db)
library(Category)
library(GOstats)
library(GO.db)
genesGO <- list (A,B)
names(genesGO) <- c("A","B")
allgenesGO <- mappedkeys(org.Hs.egGO)
onto.overGO.bp<-list()
for (i in 1:length(genesGO)){
paramsGO <- new("GOHyperGParams", geneIds = genesGO[[i]][genesGO[[i]] %in% allgenesGO], universeGeneIds = allgenesGO, annotation = "org.Hs.eg.db", ontology = "BP", testDirection = "over", pvalueCutoff=1)
hypGO <- hyperGTest(paramsGO)
tgo <- summary(hypGO, pvalue = 1)
onto.overGO <- data.frame(tgo[, 1:2], FDR = p.adjust(tgo[, 2], method = "fdr"),tgo[, 3:7])
onto.overGO.bp[[i]]<-onto.overGO
}
names(onto.overGO.bp) <- c("A","B")
nrow(onto.overGO.bp[[1]])
nrow(onto.overGO.bp[[2]])
However, the number of GO terms obtained varies with the gene set even when selecting p-value as 1: nrow(onto.overGO.bp[[1]])
is different from nrow(onto.overGO.bp[[2]])
, which makes impossible to build a heatmap.
Is it something missing in the code?
You can also get all the terms using GeneSCF. And check this post might be helpful to you,
C: Gene ontology comparsion between up- and down-regulated genes
Sorry for not helping you for direct solution with GOstats.