Why a gene name have more than one ensembl ID in gtf?
1
2
Entering edit mode
4.0 years ago
walker ▴ 30

As described. In ensembl gtf file, I find there are different gene_ids having same gene_name.For example, gene_name is TBCE, and gene_ids are ENSG00000284770 and ENSG00000285053.

ensembl gtf • 1.8k views
0
Entering edit mode

Those two ID' appear to have overlapping loci.

overlapping locus
exon(s) of the locus overlap exon(s) of a readthrough transcript or a transcript belonging to another locus


3
Entering edit mode
4.0 years ago
Emily 23k

Looks like ENSG00000285053 is a readthrough of ENSG00000284770 and GGPS1 ENSG00000152904: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000285053;r=1:235328570-235448952

Readthroughs are annoying because we need RefSeq to agree they exist before HGNC can give them a meaningful name (in this case it would be GGPS1-TBCE). I'll report it to the relevant people, but sadly we might not be able to get it renamed. As it is, it's just taken the name of the gene it has the most sequence similarity to, which is TBCE.

0
Entering edit mode

I see, but I want to know if there is any other situation? For instance, there is no overlap between ENSG00000274559 and ENSG00000234289. However, they are both named "H2BFS" in GTF.

0
Entering edit mode

If you have a list you would like us to investigate, please send it in to helpdesk [at] ensembl.org.

0
Entering edit mode

Thank you so much, I will try this later.