Deprecated identifiers in Ensembl?
1
1
Entering edit mode
5.5 years ago
Solowars ▴ 70

Dear community,

I am working with a gene family across a wide range of chordate species, and one of the steps was to retrieve potential homologs from Ensembl. Interestingly, there were some of these genes that had their ID deprecated in newer versions of Ensembl. However, annotated sequences from these deprecated genes had reasonably good quality, and they matched the phylogenetic position that I'd expect, in phylogenetic analyses (so I could tentatively assume that the annotation was correct). I am curious of the reasons why some gene annotations are deprecated from version to version, even though they look like the annotation is correct, and whether it is wise to use such sequences nonetheless.

All best,

Ensembl Orthologs Annotation • 2.7k views
ADD COMMENT
0
Entering edit mode

Could you give us some examples of deprecated identifiers, please. We may be able to pin down why.

ADD REPLY
0
Entering edit mode

Sure! Please consider these examples:

ENSMICG00000011090 ENSMICG00000011081 ENSMICG00000011093 ENSDORG00000003549

ADD REPLY
0
Entering edit mode

Thanks, will take a look tomorrow

ADD REPLY
0
Entering edit mode

Thanks a lot for your help! Best

ADD REPLY
2
Entering edit mode
5.5 years ago
Emily 23k

All these genes are slightly different.

The Mouse Lemur genes were all changed we moved to a new genome assembly. ENSMICG00000011090 and ENSMICG00000011081 both disappeared when we moved from v2 to v3 of the assembly, whereas ENSMICG00000011093 was lost when we moved from v1 to v2. The good news is that the genes for two of them, HTR3C and HTR3E are in the latest version of Ensembl. The contigs underlying both of them have changed significantly, and this is the reason why the identifiers were changed and not mapped between the releases. I cannot find Mouse Lemur HTR3D in the current Ensembl database, although I suspect that ENSMICG00000049384 could be the correct gene based on sequence and genomic position – I will see if this ought to be annotated as this gene.

Kangaroo rat ENSDORG00000003549 is different. It was not lost when there was a new genome assembly. Looking that the gene on the assembly at that time, it has two exons that overlaps gaps in the assembly, which is probably why it was deprecated. Since it doesn't have a gene name, I searched for the sequence using BLAST in the current database, and there are few solid options for the gene it might be now. It has since had a new genome assembly and the gaps repaired, so could be annotated properly.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2415 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6