Question: Deprecated identifiers in Ensembl?
1
gravatar for Solowars
9 months ago by
Solowars50
Brazil/Porto Alegre/UFRGS
Solowars50 wrote:

Dear community,

I am working with a gene family across a wide range of chordate species, and one of the steps was to retrieve potential homologs from Ensembl. Interestingly, there were some of these genes that had their ID deprecated in newer versions of Ensembl. However, annotated sequences from these deprecated genes had reasonably good quality, and they matched the phylogenetic position that I'd expect, in phylogenetic analyses (so I could tentatively assume that the annotation was correct). I am curious of the reasons why some gene annotations are deprecated from version to version, even though they look like the annotation is correct, and whether it is wise to use such sequences nonetheless.

All best,

annotation ensembl orthologs • 340 views
ADD COMMENTlink modified 9 months ago by Emily_Ensembl18k • written 9 months ago by Solowars50

Could you give us some examples of deprecated identifiers, please. We may be able to pin down why.

ADD REPLYlink modified 9 months ago • written 9 months ago by Emily_Ensembl18k

Sure! Please consider these examples:

ENSMICG00000011090 ENSMICG00000011081 ENSMICG00000011093 ENSDORG00000003549

ADD REPLYlink written 9 months ago by Solowars50

Thanks, will take a look tomorrow

ADD REPLYlink written 9 months ago by Emily_Ensembl18k

Thanks a lot for your help! Best

ADD REPLYlink written 9 months ago by Solowars50
2
gravatar for Emily_Ensembl
9 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

All these genes are slightly different.

The Mouse Lemur genes were all changed we moved to a new genome assembly. ENSMICG00000011090 and ENSMICG00000011081 both disappeared when we moved from v2 to v3 of the assembly, whereas ENSMICG00000011093 was lost when we moved from v1 to v2. The good news is that the genes for two of them, HTR3C and HTR3E are in the latest version of Ensembl. The contigs underlying both of them have changed significantly, and this is the reason why the identifiers were changed and not mapped between the releases. I cannot find Mouse Lemur HTR3D in the current Ensembl database, although I suspect that ENSMICG00000049384 could be the correct gene based on sequence and genomic position – I will see if this ought to be annotated as this gene.

Kangaroo rat ENSDORG00000003549 is different. It was not lost when there was a new genome assembly. Looking that the gene on the assembly at that time, it has two exons that overlaps gaps in the assembly, which is probably why it was deprecated. Since it doesn't have a gene name, I searched for the sequence using BLAST in the current database, and there are few solid options for the gene it might be now. It has since had a new genome assembly and the gaps repaired, so could be annotated properly.

ADD COMMENTlink written 9 months ago by Emily_Ensembl18k

Thanks a lot for your help!

ADD REPLYlink written 9 months ago by Solowars50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour