GO and KEGG enrichment analysis for non-model organism
8 weeks ago

I have obtained a list of genes after differential gene expression analysis with DESeq2. I am working on the genome of a non-model fungal organism. I also have the GO terms associated with these DEGs from interpro analysis. But for carrying out GO and KEGG enrichment analysis, standard database IDs like Ensembl IDs are needed. Whereas, in my list, the gene IDs are g3041, g2134... as given out by the gene prediction tools. Can someone please explain how I can use my gene IDs to carry out the GO and KEGG enrichment analysis steps?

You need to use a tool like eggNOG to associate your genes with KEGG orthologs, then perform GO enrichment on the ortholog set using a tool like topGO.

8 weeks ago
h.mon 34k

You don't necessarily needs Ensembl identifiers. You need to create an object with the gene to GO / KEGG mapping. Different packages will use different structures. You can create the file outside of R and read it, or you can create this mapping directly in R.

As several packages can read files in the gmt format (e.g., read.gmt() from clusterProfiler, or GSA.read.gmt()from GSA, and so on), what I usually do is create a gmt file from the annotation, then read this gmt to perform the gene set enrichment.


