I am trying to use Ensembl transcript ids to identify the amino acid sequences they encode.
My problem is that I have found the below example in which the transcript ID's related to a given amino acid sequence appear 'swapped' when going from a version of Ensembl that uses genome build 37 to one that uses genome build 38. Note that the 'Name' attribute has not swapped between the AA sequences. Note that when asserting that the Ensembl transcript Ids have been swapped I have ignored the version numbers after the decimal point in the id.
My questions are
- How prevalent is this sort of swap?
- Should I instead be using the 'Name' attribute of an Ensembl transcript, if I want an ID that is stable with respect to the amino acid sequence
- Have I missed something obvious? I am just getting started with these data sources.
Many thanks,
Matt
Ensembl v86, using genome build 37
Name Transcript Id Bp Protein
--------------------------------------------
RTFDC1-001 ENST00000023939.4 1650 306aa
RTFDC1-201 ENST00000357348.5 1476 336aa
Ensembl version 88, using genome build 38
Name Transcript Id Bp Protein
--------------------------------------------
RTFDC1-001 ENST00000357348.9 2212 306aa
RTFDC1-201 ENST00000023939.8 1745 336aa
Tagging: Emily_Ensembl