Question: Convert gene ID to gene name/symbol in gene set file (GMT)
0
gravatar for Microuser
5 weeks ago by
Microuser0
Microuser0 wrote:

Hi,

I have obtained gene sets file (GMT) from both KEGG and MSigDB databases. Each of these files show gene IDs, but I need gene names (symbols). Is there any r function or package that I can use to convert IDs to gene names? At the end, I still need to have this file in GMT format to use in another package.

Thanks.

gmt geneset • 160 views
ADD COMMENTlink modified 5 weeks ago by Kevin Blighe65k • written 5 weeks ago by Microuser0
1

MSigDB provides GMT files for both IDs and symbols.

Just out of curiosity, which package requires file in GMT format?

ADD REPLYlink written 5 weeks ago by igor11k

Thanks igor. I wanna use CEMiTool. I downloaded MSigDB GMT file with EnrichmentBrowser package for my organism, but it only gives entrez IDs.

ADD REPLYlink written 5 weeks ago by Microuser0
2

EnrichmentBrowser uses msigdbr and KEGGREST to get MSigDB and KEGG pathways, respectively. You should be able to use those packages directly to get the gene sets with gene symbols. Keep in mind, MSigDB pathways are based on human, mouse, or rat studies.

CEMiTool does not require a GMT file. It has the read_gmt() function to convert a GMT file to a list. You just need to have your gene sets in the same format as what read_gmt() returns.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by igor11k
1

You can download the signatures manually with Entrez IDs or HGNC symbols: https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#C2

I have not used EnrichmentBrowser

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe65k

Thanks Kevin. Initially, I downloaded from the website, but after ORA analysis, I saw many irrelevant pathways to my organism of interest (S. cerevisiae) appeared (e.g. oncogenic signatures). I couldn't find organism-specific gene sets to download from the database website and found it very human-based, so used that package. Please correct me if I think wrongly. Thank you

ADD REPLYlink written 5 weeks ago by Microuser0
2
gravatar for Kevin Blighe
5 weeks ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Hey,

Assuming that, by 'gene ID', you mean Entrez ID, you can convert these to HGNC gene symbols via two packages (one or the other):

  • biomaRt
  • org.Hs.eg.db

For other issues like maintaining the GMT format, I am confident that you can get through this by thinking it through.

Kevin

ADD COMMENTlink written 5 weeks ago by Kevin Blighe65k

Thanks Kevin. I used biomaRt and out of 1800 Entrez ID, I only get 870. No perfect idea about the reason. I don't know how I can replace them with gene symbols because these numbers do not match.

ADD REPLYlink written 5 weeks ago by Microuser0
1

It is expected that not all will match, as you are comparing across annotation databases. The ones that do not match are likely non-coding RNAs, pseudogenes, or hypothetical genes that may be of minimal relevance.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 634 users visited in the last hour