GOStats "genes being tested do not have corresponding GO terms"
0
0
Entering edit mode
2.4 years ago
kstangline ▴ 80

Hi all,

Has anyone ever had problems with the org.Hs.eg.db database not picking up any GO classifications for a list of genes?

For example, I have a list of 21 genes, and when I try referencing the org.Hs.eg.db database, I get "genes being tested do not have corresponding GO terms" .

For reference, this is how I've setup my R code. I'm using Entrez identifiers because the hyperGTest function doesn't seem to take in ENSG identifiers.

library(GO.db)
library(GOstats)
library("org.Hs.eg.db")


# GO Analysis
# final_results is a deseq2 results data frame

res_01 = as.data.frame(subset(final_results, padj<0.1))
sig_lfc = 0.1

selectGenesUp <- unique(res_01[res_01$log2FoldChange>sig_lfc, 'ENTREZID'])
selectGenesDown <- unique(res_01[res_01$log2FoldChange<(-sig_lfc), 'ENTREZID'])

universeGenes <- unique(res_01$ENTREZID)

# p val cutoff for GO test

cutoff = 0.1

# build params

upParams <- try(new("GOHyperGParams",
                geneIds=selectGenesUp,
                universeGeneIds = universeGenes,
                annotation = "org.Hs.eg.db",
                ontology = "BP",
                pvalueCutoff=cutoff,
                conditional=FALSE,
                testDirection="over"))

downParams <- try(new("GOHyperGParams",
                geneIds=selectGenesDown,
                universeGeneIds = universeGenes,
                annotation = "org.Hs.eg.db",
                ontology = "BP",
                pvalueCutoff=cutoff,
                conditional=FALSE,
                testDirection="over"))
GO • 908 views
ADD COMMENT
1
Entering edit mode

21 genes is a pretty tiny little set. Are you sure your genes actually map to any terms? (you can check: http://geneontology.org/). You'll have a hard time testing for enrichment with such a small set, but you should be able to at least identify what terms they represent (if any).

Plus it looks like your universe is only the fraction of genes with padj < 0.1 rather than all the genes you measured. As a background set, you would normally use all the genes in final_results.

If you really have only a small number of genes up or down that you want to test (e.g. like 21), you could also consider simply taking the top 100 or 200 genes up or down by sorted p-values - just to see what sort of biology is floating to the edges of your experiment.

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6