I'm trying to imply the CDS sequence onto transcripts assembled using StringTie by comparing the structures with the Ensembl.
In the below example, we can see two different start codons being used. It seems That which one is being used is correlated to the downstream splice-site. Can any explain that? How does Ensembl know which splicesites are being used?
The transcripts that use the later start are marked as "nonsense mediated decay", so it seems unlikely that this is annotated via alignment of a ORF from a different species?
Is there any known mechanism that might tie start codon choice to downstream splicing decisions?