Isoform is such an ambiguous qualifier it should be banned (but we know this is spitting in the wind). It has ancient origins back in the days of protein isoelectric focusing where band-splits were just conveniently named isoforms (even though this splitting was dominated by carbohydrate side chains, not endogenous protein ionisation states). These days isoforms can be conflated between alternative splicing and or initiations, or sometimes sequence variants of different mechanistic origins as well as a range of post-translational differences. As inferred above each of these "isoforms" needs to be rigorously defined
The key (or what I find useful at least) is to grasp the canonical concept of UniProt already alluded to (actually only the Swiss-Prot section). Simply put, the curators default to the longest, maximum exon sequence as a defined reference (but different to the RefSeq concept) to which all other data-supported changes in coding sequence (or post-trans mods) are then mapped as cross-references. This is not the only organizing principle one could come up with by a long chalk but its a good one in my opinion. Note though from 1986 until post 2000 this was applied pre-genomic (to the longest CDS in a cDNA) so post-genomic mapping brings a new set of challenges (as mentioned one of these is what to do with say histones with identical protein sequences). But, the canonical model still holds as the exon set from a single gene locus.
PDB mappings raise their own can of worms but, in answer to the question of matching to the canonical sequence in Swiss-Prot this is exactly what PDBe tries to do. The actual sequence as a string could be resolved from the individual structures (with its own errors or PCR-induced variants) used to be explicitly designated via a GI number (and indexed in BLASTP against nr). I'm not sure how that is handled now but, as mentioned even the smallest changes between PDBs (e.g. trimming or leaving a His purification tag) will spawn a new sequence in nr and TrEMBL.
modified 2.2 years ago
2.2 years ago by
cdsouthan • 1.8k