Convert gene ID to gene name/symbol in gene set file (GMT)
1
0
Entering edit mode
3.7 years ago
Microuser • 0

Hi,

I have obtained gene sets file (GMT) from both KEGG and MSigDB databases. Each of these files show gene IDs, but I need gene names (symbols). Is there any r function or package that I can use to convert IDs to gene names? At the end, I still need to have this file in GMT format to use in another package.

Thanks.

GMT geneset • 3.4k views
ADD COMMENT
2
Entering edit mode

MSigDB provides GMT files for both IDs and symbols.

Just out of curiosity, which package requires file in GMT format?

ADD REPLY
0
Entering edit mode

Thanks igor. I wanna use CEMiTool. I downloaded MSigDB GMT file with EnrichmentBrowser package for my organism, but it only gives entrez IDs.

ADD REPLY
2
Entering edit mode

EnrichmentBrowser uses msigdbr and KEGGREST to get MSigDB and KEGG pathways, respectively. You should be able to use those packages directly to get the gene sets with gene symbols. Keep in mind, MSigDB pathways are based on human, mouse, or rat studies.

CEMiTool does not require a GMT file. It has the read_gmt() function to convert a GMT file to a list. You just need to have your gene sets in the same format as what read_gmt() returns.

ADD REPLY
1
Entering edit mode

You can download the signatures manually with Entrez IDs or HGNC symbols: https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#C2

I have not used EnrichmentBrowser

ADD REPLY
0
Entering edit mode

Thanks Kevin. Initially, I downloaded from the website, but after ORA analysis, I saw many irrelevant pathways to my organism of interest (S. cerevisiae) appeared (e.g. oncogenic signatures). I couldn't find organism-specific gene sets to download from the database website and found it very human-based, so used that package. Please correct me if I think wrongly. Thank you

ADD REPLY
2
Entering edit mode
3.7 years ago

Hey,

Assuming that, by 'gene ID', you mean Entrez ID, you can convert these to HGNC gene symbols via two packages (one or the other):

  • biomaRt
  • org.Hs.eg.db

For other issues like maintaining the GMT format, I am confident that you can get through this by thinking it through.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin. I used biomaRt and out of 1800 Entrez ID, I only get 870. No perfect idea about the reason. I don't know how I can replace them with gene symbols because these numbers do not match.

ADD REPLY
1
Entering edit mode

It is expected that not all will match, as you are comparing across annotation databases. The ones that do not match are likely non-coding RNAs, pseudogenes, or hypothetical genes that may be of minimal relevance.

ADD REPLY

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6