Question: Deprecated identifiers in Ensembl?
9 months ago
Brazil/Porto Alegre/UFRGS
Solowars wrote:

Dear community,

I am working with a gene family across a wide range of chordate species, and one of the steps was to retrieve potential homologs from Ensembl. Interestingly, there were some of these genes that had their ID deprecated in newer versions of Ensembl. However, annotated sequences from these deprecated genes had reasonably good quality, and they matched the phylogenetic position that I'd expect, in phylogenetic analyses (so I could tentatively assume that the annotation was correct). I am curious of the reasons why some gene annotations are deprecated from version to version, even though they look like the annotation is correct, and whether it is wise to use such sequences nonetheless.

All best,

Could you give us some examples of deprecated identifiers, please. We may be able to pin down why.

Sure! Please consider these examples:

ENSMICG00000011090 ENSMICG00000011081 ENSMICG00000011093 ENSDORG00000003549

Thanks, will take a look tomorrow

Thanks a lot for your help! Best

9 months ago
Emily_Ensembl wrote:

All these genes are slightly different.

The Mouse Lemur genes were all changed we moved to a new genome assembly. ENSMICG00000011090 and ENSMICG00000011081 both disappeared when we moved from v2 to v3 of the assembly, whereas ENSMICG00000011093 was lost when we moved from v1 to v2. The good news is that the genes for two of them, HTR3C and HTR3E are in the latest version of Ensembl. The contigs underlying both of them have changed significantly, and this is the reason why the identifiers were changed and not mapped between the releases. I cannot find Mouse Lemur HTR3D in the current Ensembl database, although I suspect that ENSMICG00000049384 could be the correct gene based on sequence and genomic position – I will see if this ought to be annotated as this gene.

Kangaroo rat ENSDORG00000003549 is different. It was not lost when there was a new genome assembly. Looking that the gene on the assembly at that time, it has two exons that overlaps gaps in the assembly, which is probably why it was deprecated. Since it doesn't have a gene name, I searched for the sequence using BLAST in the current database, and there are few solid options for the gene it might be now. It has since had a new genome assembly and the gaps repaired, so could be annotated properly.

Thanks a lot for your help!

