Question: Ensembl: Map between different assemblies of one ensembl release
1
gravatar for hendrik
5.5 years ago by
hendrik20
European Union
hendrik20 wrote:

There are ensembl gene ids which are mapped not on the primary assembly but on an alternative sequence. On the ensembl website (http://www.ensembl.org) it is possible to go to the associated gene mapped on the primary assembly by clicking on "View this gene on the primary assembly". Is it possible to obtain somewhere a table of these associations, at the best for each ensembl release separately?

Example:

DAXX (ENSG00000206206) http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000206206;r=CHR_HSCHR6_MHC_DBB_CTG1:33296480-33307190

can be "mapped" on

DAXX (ENSG00000204209) http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204209;r=6:33318558-33329269

and I would like to have a table like for each ensembl id I have

gene_on_alternative_assemly assembly gene_on_primary_assembly
ENSG00000206206 CHR_HSCHR6_MHC_DBB_CTG1 ENSG00000204209
ENSG00000173876 CHR_HSCHR4_6_CTG12 ENSG00000251297
...    

Many thanks in advance

assembly map ensembl • 1.5k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by hendrik20
1
gravatar for Emily_Ensembl
5.5 years ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

There's a table of the mapping per gene, but we don't have a whole genome table in that format. The data is stored in the alt_allele MySQL table, which can be linked out to other tables in the MySQL schema to get the data you need. It is also possible get these data via the Ensembl Perl API, which allows you to get all alt_alleles for a gene.

ADD COMMENTlink written 5.5 years ago by Emily_Ensembl21k

The alt_allele_group_id was introduced to table alt_allele in release 73. What table I could use for mapping ensembl ids on different assemblies before release 73?

ADD REPLYlink written 5.5 years ago by hendrik20

We did not provide mapping back then. Generally, we say that every release is an improvement (as is obvious in this case) so you should always use the current release. If you have to use old data, the best option (although I don't think it's great) is to use gene name.

ADD REPLYlink written 5.5 years ago by Emily_Ensembl21k
1
gravatar for hendrik
5.5 years ago by
hendrik20
European Union
hendrik20 wrote:

Based on Emily_Ensembl's answer I constructed a sql statement for querying a genome wide mapping table:

SELECT g.stable_id original_gene_stable_id,sr.name original_gene_seq_region_name,g_mapped.stable_id mapped_gene_stable_id,g_mapped.name mapped_gene_seq_region
FROM gene AS g
JOIN seq_region AS sr ON sr.seq_region_id=g.seq_region_id
JOIN alt_allele AS aa ON g.gene_id=aa.gene_id
JOIN (
  SELECT aa_sub.alt_allele_group_id,g_sub.stable_id,sr_sub.name
  FROM alt_allele AS aa_sub
  JOIN gene as g_sub ON aa_sub.gene_id=g_sub.gene_id
  JOIN seq_region AS sr_sub ON sr_sub.seq_region_id=g_sub.seq_region_id
  WHERE sr_sub.name REGEXP '^[0-9]+$'
) AS g_mapped ON g_mapped.alt_allele_group_id=aa.alt_allele_group_id;

which results in something like

+-------------------------+-------------------------------+-----------------------+------------------------+
| original_gene_stable_id | original_gene_seq_region_name | mapped_gene_stable_id | mapped_gene_seq_region |
+-------------------------+-------------------------------+-----------------------+------------------------+
| ENSG00000262435         | CHR_HSCHR12_1_CTG1            | ENSG00000226210       | 12                     |
| ENSG00000226210         | 12                            | ENSG00000226210       | 12                     |
| ENSG00000220483         | 6                             | ENSG00000220483       | 6                      |
| ENSG00000281567         | CHR_HSCHR6_1_CTG7             | ENSG00000220483       | 6                      |
| ENSG00000233859         | 6                             | ENSG00000233859       | 6                      |
+-------------------------+-------------------------------+-----------------------+------------------------+

The gene in the first column is mapped to the gene in the third column which has the same alt_allele_group and a integer as the seq_region_name.

Note: This statement is not valid for Ensembl releases before 73 since the table alt_allele does not contain a column alt_allele_group_id in previous releases.

 

ADD COMMENTlink written 5.5 years ago by hendrik20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1788 users visited in the last hour