Question

Converting Ensembl Gene Ids To Hgnc Gene Name / Coordinates

4

Entering edit mode

12.4 years ago

Ryan D ★ 3.4k

I have a long list of Ensembl IDs and am completely unfamiliar with the Ensembl browser. I want to convert these guys to Refseq IDs using the UCSC table browser, mySQL, or some other tool. So given a list like this:

ENSG00000197021
ENSG00000204379

How do I turn them into this?

CXorf40B
XAGE1A

I gather from what I've tried that there are not 1-to-1 mappings for Ensembl gene IDs to RefSeq names, but I'd like to be wrong. At a minimum, I'd like to get the hg18 or hg19 coordinates for a list of these genes, but Biomart only comes back with about 3/4 of them. Any help?

ensembl ucsc genome mapping coordinates • 23k views

ADD COMMENT • link updated 6.3 years ago by ericrkofman ▴ 20 • written 12.4 years ago by Ryan D ★ 3.4k

3

Entering edit mode

Do you only get coordinates for 3/4 of your Ensembl gene IDs or only RefSeq IDs for 3/4 of your Ensembl gene IDs? From which version of Ensembl are your Ensembl gene IDs? Lastly, CXorf40B and XAGE1A are HGNC symbols.

ADD REPLY • link 12.4 years ago by Bert Overduin ★ 3.7k

1

Entering edit mode

Ensembl has more genes than RefSeq (at least for human), which might partly explain why you only get mappings for 3/4 of your genes.

ADD REPLY • link 12.4 years ago by Andrew W ▴ 290

0

Entering edit mode

Similar post: Gene Id Conversion Tool

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Rm 8.3k

0

Entering edit mode

As Bert says, your examples are not Refseq IDs. Please refine the question title.

ADD REPLY • link 12.4 years ago by Neilfws 49k

7

Entering edit mode

12.4 years ago

Jorge Amigo 14k

the easiest way I know is to use BioMart, selecting the database and dataset of interest, going to the Filters section and on the Gene subsection enter the ensembl geneids (ID list limits), and choosing on the Attributes section the Associated Gene Name. you may add any other wanted attribute, such as Chromosome Name, Gene Start (bp) or Gene End (bp).

ADD COMMENT • link 12.4 years ago by Jorge Amigo 14k

1

Entering edit mode

You can access archived versions of Ensembl via this link: http://asia.ensembl.org/info/website/archives/index.html. Click on any archive to access older BioMarts e.g. http://sep2009.archive.ensembl.org/biomart/martview.

ADD REPLY • link 12.4 years ago by Neilfws 49k

1

Entering edit mode

Thanks, Neil. This appears to work. I was able to pull 387 of these. I then used LiftOver to get to hg19 coordinates. Thanks everyone.

ADD REPLY • link 12.4 years ago by Ryan D ★ 3.4k

0

Entering edit mode

Hi Jorge, it seems to get me all of the information I need, but as with Pierre's search, it comes back with 287 matches. A number of genes are not found:

ENSG00000213499
ENSG00000203949
ENSG00000203981
ENSG00000203946
ENSG00000213503
ENSG00000068990

It seems that some of these genes might only be mapped to hg18. Is there a way to access that build through Biomart? It was not immediately evident.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Ryan D ★ 3.4k

4

Entering edit mode

12.4 years ago

Pierre Lindenbaum 161k

Second solution using the mysql server of ensembl:

mysql -h ensembldb.ensembl.org --port 5306  -u anonymous -D homo_sapiens_core_64_37 -A

> select distinct
   G.stable_id,
   S.synonym
from
  gene_stable_id as G,
  object_xref as OX,
  external_synonym as S,
  xref as X ,
  external_db as D
where
  D.external_db_id=X.external_db_id and
  X.xref_id=S.xref_id and
  OX.xref_id=X.xref_id and
  OX.ensembl_object_type="Gene" and
  G.gene_id=OX.ensembl_id and
  G.stable_id in ("ENSG00000197021", "ENSG00000204379");

+-----------------+----------+
| stable_id       | synonym  |
+-----------------+----------+
| ENSG00000197021 | CXorf40  |
| ENSG00000197021 | CXorf40B |
| ENSG00000197021 | EOLA1    |
| ENSG00000204379 | CT12.1a  |
| ENSG00000204379 | GAGED2   |
| ENSG00000204379 | XAGE-1   |
| ENSG00000204379 | XAGE1    |
| ENSG00000204379 | CT12.1   |
| ENSG00000204379 | CT12.1B  |
| ENSG00000204379 | CTP9     |
| ENSG00000204379 | XAGE1A   |
| ENSG00000204379 | XAGE1C   |
| ENSG00000204379 | XAGE1D   |
| ENSG00000204379 | XAGE1E   |
| ENSG00000204379 | XAGE1E.  |
| ENSG00000204379 | XAGE1B   |
| ENSG00000204379 | CT12.1E  |
| ENSG00000204379 | CT12.1C  |
| ENSG00000204379 | CT12.1D  |
+-----------------+----------+
19 rows in set (0.05 sec)

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Getting below error when tried with homo_sapiens_core_75_37

ERROR 1146 (42S02): Table 'homo_sapiens_core_75_37.gene_stable_id' doesn't exist

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 9.6 years ago by Rm 8.3k

0

Entering edit mode

Sorry, it is solved after changing to gene as G, instead of gene_stable_id as G,

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 9.6 years ago by Rm 8.3k

2

Entering edit mode

10.5 years ago

Giulietta - Ensembl Helpdesk ★ 1.2k

You can do this with Ensembl with MySQL:

Consider the display_xref_id rather than going for external synonym. The external synonym is not always available.

The following query should work: select distinct stable_id, display_label from gene g, xref x where x.xref_id = g.display_xref_id

ADD COMMENT • link 10.5 years ago by Giulietta - Ensembl Helpdesk ★ 1.2k

0

Entering edit mode

Thanks Giulietta for this solution. Working for me to lookup the symbol for all human ensembl genes. I added two additional columns to track the source database for the symbol:

SELECT
  gene.stable_id AS ensembl_gene_id,
  xref.display_label AS gene_symbol,
  external_db.db_name AS gene_symbol_source_db,
  xref.dbprimary_acc AS gene_symbol_source
FROM gene
LEFT JOIN xref ON xref.xref_id = gene.display_xref_id
LEFT JOIN external_db ON xref.external_db_id = external_db.external_db_id
ORDER BY stable_id

Took me a while to figure this out as detailed here. Upvoted your reply so hopefully it becomes more visible. This seems like a very common use case (getting the symbol for an ensembl ID), so surprised it wasn't more clearly explained in the official ensembl docs (or I missed it).

ADD REPLY • link 3.3 years ago by Daniel Himmelstein ▴ 10

1

Entering edit mode

6.3 years ago

ericrkofman ▴ 20

I ran into this issue too, and found no easy solutions online. So I made a quick script that could perform the conversion: https://github.com/vanallenlab/EnsemblToHGNC

Hope it's helpful! The Ensembl/HGNC mapping it uses is a download from January 2018.

ADD COMMENT • link 6.3 years ago by ericrkofman ▴ 20

0

Entering edit mode

12.4 years ago

Andrew W ▴ 290

Looking just at the mapping of Ensembl gene IDs to NCBI Gene IDs (or locus names- I would work with Gene IDs first and convert them to locus names at the end of the processing), you can use CCDS flatfiles (in ftp://ftp.ncbi.nih.gov/pub/CCDS/current_human/) and Ensembl cDNA Fasta files (e.g. ftp://ftp.ensembl.org/pub/release-64/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.64.cdna.all.fa.gz).

CCDS2Sequence.current.txt maps CCDS IDs to Ensembl transcript IDs (ENSTs).

CCDS.current.txt maps CCDS IDs to NCBI Gene IDs (and locus names).

The Ensembl Fasta header can be parsed to map Ensembl transcript IDs to Ensembl gene IDs (ENSGs).

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.4 years ago by Andrew W ▴ 290

Ram · Accepted Answer · 2011-11-17

8

Entering edit mode

12.4 years ago

Pierre Lindenbaum 161k

using the public mysql server of UCSC:

mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19
mysql> select distinct G.gene,N.value from ensGtp as G, ensemblToGeneName as N where 
  G.transcript=N.name and
  G.gene in ("ENSG00000197021", "ENSG00000204379") ;
+-----------------+----------+
| gene            | value    |
+-----------------+----------+
| ENSG00000197021 | CXorf40B | 
| ENSG00000204379 | XAGE1A   | 
+-----------------+----------+
2 rows in set (0.21 sec)

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks, Pierre. I gather I could also pull chr/start/end this way? Or do I need to access another table? This got me a similar result: 287 results of 389 ENSEMBL IDs queried.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Ryan D ★ 3.4k

0

Entering edit mode

I tried doing the same in hg18, but get the error:

ERROR 1146 (42S02): Table 'hg18.ensemblToGeneName' doesn't exist

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 12.4 years ago by Ryan D ★ 3.4k

0

Entering edit mode

for the positions you can use a 3rd table named "ensGene" mapping the ensembl transcripts.

ADD REPLY • link 12.4 years ago by Pierre Lindenbaum 161k