Question

Ensembl: Map between different assemblies of one ensembl release

1

Entering edit mode

8.9 years ago

hendrik ▴ 40

There are ensembl gene ids which are mapped not on the primary assembly but on an alternative sequence. On the ensembl website (http://www.ensembl.org) it is possible to go to the associated gene mapped on the primary assembly by clicking on "View this gene on the primary assembly". Is it possible to obtain somewhere a table of these associations, at the best for each ensembl release separately?

Example:

DAXX (ENSG00000206206) (http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000206206;r=CHR_HSCHR6_MHC_DBB_CTG1:33296480-33307190)

can be "mapped" on

DAXX (ENSG00000204209) (http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204209;r=6:33318558-33329269)

and I would like to have a table like for each ensembl id I have

gene_on_alternative_assembly  assembly                  gene_on_primary_assembly
ENSG00000206206               CHR_HSCHR6_MHC_DBB_CTG1   ENSG00000204209
ENSG00000173876               CHR_HSCHR4_6_CTG12        ENSG00000251297
...

Many thanks in advance

ensembl • 2.4k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by hendrik ▴ 40

Ram · Answer 1 · 2015-05-27

Based on Emily_Ensembl's answer I constructed a sql statement for querying a genome wide mapping table:

SELECT g.stable_id original_gene_stable_id,sr.name original_gene_seq_region_name,g_mapped.stable_id mapped_gene_stable_id,g_mapped.name mapped_gene_seq_region
FROM gene AS g
JOIN seq_region AS sr ON sr.seq_region_id=g.seq_region_id
JOIN alt_allele AS aa ON g.gene_id=aa.gene_id
JOIN (
  SELECT aa_sub.alt_allele_group_id,g_sub.stable_id,sr_sub.name
  FROM alt_allele AS aa_sub
  JOIN gene as g_sub ON aa_sub.gene_id=g_sub.gene_id
  JOIN seq_region AS sr_sub ON sr_sub.seq_region_id=g_sub.seq_region_id
  WHERE sr_sub.name REGEXP '^[0-9]+$'
) AS g_mapped ON g_mapped.alt_allele_group_id=aa.alt_allele_group_id;

which results in something like

+-------------------------+-------------------------------+-----------------------+------------------------+
| original_gene_stable_id | original_gene_seq_region_name | mapped_gene_stable_id | mapped_gene_seq_region |
+-------------------------+-------------------------------+-----------------------+------------------------+
| ENSG00000262435         | CHR_HSCHR12_1_CTG1            | ENSG00000226210       | 12                     |
| ENSG00000226210         | 12                            | ENSG00000226210       | 12                     |
| ENSG00000220483         | 6                             | ENSG00000220483       | 6                      |
| ENSG00000281567         | CHR_HSCHR6_1_CTG7             | ENSG00000220483       | 6                      |
| ENSG00000233859         | 6                             | ENSG00000233859       | 6                      |
+-------------------------+-------------------------------+-----------------------+------------------------+

The gene in the first column is mapped to the gene in the third column which has the same alt_allele_group and a integer as the seq_region_name.

Note: This statement is not valid for Ensembl releases before 73 since the table alt_allele does not contain a column alt_allele_group_id in previous releases.

Ram · Answer 2 · 2015-05-26

1

Entering edit mode

8.9 years ago

Emily 23k

There's a table of the mapping per gene, but we don't have a whole genome table in that format. The data is stored in the alt_allele MySQL table, which can be linked out to other tables in the MySQL schema to get the data you need. It is also possible get these data via the Ensembl Perl API, which allows you to get all alt_alleles for a gene.

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Emily 23k

0

Entering edit mode

The alt_allele_group_id was introduced to table alt_allele in release 73. What table I could use for mapping ensembl ids on different assemblies before release 73?

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by hendrik ▴ 40

0

Entering edit mode

We did not provide mapping back then. Generally, we say that every release is an improvement (as is obvious in this case) so you should always use the current release. If you have to use old data, the best option (although I don't think it's great) is to use gene name.

ADD REPLY • link updated 15 months ago by Ram 43k • written 8.9 years ago by Emily 23k