Why a gene name have more than one ensembl ID in gtf?
2.7 years ago
walker ▴ 30

As described. In ensembl gtf file, I find there are different gene_ids having same gene_name.For example, gene_name is TBCE, and gene_ids are ENSG00000284770 and ENSG00000285053.

Those two ID' appear to have overlapping loci.

overlapping locus
exon(s) of the locus overlap exon(s) of a readthrough transcript or a transcript belonging to another locus


Tagging Emily_Ensembl for additional clarification.

2.7 years ago

Looks like ENSG00000285053 is a readthrough of ENSG00000284770 and GGPS1 ENSG00000152904: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000285053;r=1:235328570-235448952

Readthroughs are annoying because we need RefSeq to agree they exist before HGNC can give them a meaningful name (in this case it would be GGPS1-TBCE). I'll report it to the relevant people, but sadly we might not be able to get it renamed. As it is, it's just taken the name of the gene it has the most sequence similarity to, which is TBCE.

I see, but I want to know if there is any other situation? For instance, there is no overlap between ENSG00000274559 and ENSG00000234289. However, they are both named "H2BFS" in GTF.

If you have a list you would like us to investigate, please send it in to helpdesk [at] ensembl.org.

Thank you so much, I will try this later.

