I am working with Ensembles stable IDs for transcripts (ENSTs) and thought the idea of a stable ID was to point to identical transcripts on different versions of the database in terms of sequence (DNA and protein).
Now I found some ENSTs containing major changes in the sequence on different versions:
SYNGAP1 ENST00000418600 - lost coding Exon, -58aa, -88bp, based on different Vega transcript:
CTDP1 ENST00000299543 - lost coding Exon, -119 aa, -373bp, different annotation method
So I have some questions arising from this:
1. What is guaranteed to stay stable for ensembles "stable" IDs (mainly ENSTs)?
All information I could find on this is:
Ensembl's stable identifiers are mapped between re-annotation processes using a combination of location based mappings and those generated by Exonerate. The process performs exon based mapping and deriving subsequent identifier mappings based upon these findings.
2. Why is the community using ENSTs without version numbers (which exist), which would guarantee sequence stability (according to the documentation), while refseq NM_s are usually used with version? Examples:
3. What stability do refseq stable IDs guarantee? Could you point me to any document defining the stable features I can assume for NM_ respectively versioned NM_?
Thanks for any help!
Edit: Reformatted the links and picked better database versions