I had a similar problem with ENST instability even in the same genome build some time ago.
I summarised the answer from the Ensembl helpdesk here in my own question:
Stability of Ensembl and refseq stable IDs
I first thought here it is the same. It is not what you expect for a stable ID. But it can happen.
But I just noticed Ensembl recently changed their stable ID documentation:
and posted some details regarding this:
As far as I understand the “Mapping stable identifiers“ part for assignment of transcript IDs the protein sequence or protein length is not taken into account.
“… The identity of a transcript is thus defined by the list of its exon coordinates and its underlying sequence. “ – not the protein sequence! So if I understand their documentation right, in most cases it will stay the same but it is not part of the similarity comparison in the mapping apart from the additional penalty described here for total change of transcript function! So protein sequence similarity is not the central idea of ENST and not guaranteed by the mapping between versions.
So my answers to your questions in detail:
How prevalent is this sort of swap?
I don’t know. The "important" transcripts I work with are stable protein-wise. I always wanted to find out for the general case of protein sequence change, but never found the time.
Should I instead be using the 'Name' attribute of an Ensembl transcript, if I want an ID, that is stable with respect to the amino acid sequence?
No. It also doesn’t guarantee stability. Have a look at SYNGAP1 for example and compare it in version 88 and 75 of Ensembl. The name does not guarantee any stability either.
If you want to be sure nothing is changed use ENST combined with the version of Ensembl you took your data from to get the same sequence again.
If you have to compare transcripts over different Ensembl versions, maybe it helps to keep track of the CCDS-IDs or versioned ENSP and find the transcript, mapping to it in each new Ensembl version.
If you want to dig into it, have a look at this tool: http://ugahash.uni-frankfurt.de/ It might help to spot differences in transcripts.
Have I missed something obvious? I am just getting started with these data sources.
I don’t think so. If you are just starting, you spotted this kind of problem faster than most I guess ;)
(unversioned) ENSTs are used widely in places, where they don’t make any sense in the long run.