Context: The LOEUF dataset contains this information
Gene_name Gene_ID Transcript_ID Canonical? Coordinate_start_gene Coordinate_end_gene Coordinate_start_transcript Coordinate_end_transcript LOEUF metrics …
According to the paper , they use gencode v.19 in GRCh37 to assign transcript and gene ID.
I took the gene_IDs (e.g. ENSG00000010404) and I use Ensembl to get the coordinates of these genes in the GRCh38. I did the same with the transcripts. Then, I marge the dataset to have my own LOEUF dataset in GRCh38.
I implemented this LOEUF-GRCh38 dataset to annotate CNVs from HGS VCF files
Now, I am implemented more functionality to my bioinformatic application. My application now take SNPs and indels found in exons of transcripts affected by CNVs. I am also using VEP 99 to annotate this short variants. I haven't tested properly test but this is seems it is working. I mean, I have take a few of them and I have seen that everything is ok.
The thing that concern me is. When I have annotated the HGVS nomenclature transcript_ID.c:123C>T. I have found that the transcript annotated by VEP for a SNPs is different that the transcript_ID assigned by me previously by using my LOEUF-GRCh38.
My question is
Transcripts_ID change between versions?
Looking on Ensembl I found that
Ensembl stable gene, transcript, and protein identifiers are kept the same throughout Ensembl releases unless the gene or transcript model changes dramatically.
How often this can happen?
This should logically work but I'm not sure if it's better than a liftOver operation. Are you using a gnomAD dataset for LOEUF? If so, can you look at their v2 liftover callset if that has the information you need?
Can you provide an example where your LOUEF annotated with an ENSG/ENST ID different from that assigned by VEP?
I am not sure what you mean for "gnomAD dataset for LOEUF?" I am using the 11_full_dataset provide in their paper
Where is the v2 lifover callset you mention?
This is an example
The transcript ID on the D column is taken from the dataset of LOEUF. The trancrits ID on the G coluimn is taken from VEP pluging hgvs
gnomAD has various dataset versions: https://gnomad.broadinstitute.org/downloads
LoF curation is available under v2 but not under v2 liftover or v3. Their paper mentions gnomAD a bunch and the PI seems to be Daniel MacArthur, so I assumed the dataset would be available on gnomad's downloads.
I don't see anything inconsistent from VEP - I see your annotation gives you multiple Transcript IDs (D) per Gene (B), but VEP's HGVSg is maintained consistently with the Canonical transcript. I think you're annotating without any canonical transcript constraint in your custom loeuf approach and using the canonical transcript with VEP.