Question

How to deal with different transcripts ID between versions

0

Entering edit mode

23 months ago

Manuel ▴ 40

Context: The LOEUF dataset contains this information

 Gene_name       Gene_ID              Transcript_ID     Canonical?          Coordinate_start_gene Coordinate_end_gene                Coordinate_start_transcript        Coordinate_end_transcript         LOEUF metrics   …

According to the paper , they use gencode v.19 in GRCh37 to assign transcript and gene ID.

I took the gene_IDs (e.g. ENSG00000010404) and I use Ensembl to get the coordinates of these genes in the GRCh38. I did the same with the transcripts. Then, I marge the dataset to have my own LOEUF dataset in GRCh38.

I implemented this LOEUF-GRCh38 dataset to annotate CNVs from HGS VCF files

Now, I am implemented more functionality to my bioinformatic application. My application now take SNPs and indels found in exons of transcripts affected by CNVs. I am also using VEP 99 to annotate this short variants. I haven't tested properly test but this is seems it is working. I mean, I have take a few of them and I have seen that everything is ok.

The thing that concern me is. When I have annotated the HGVS nomenclature transcript_ID.c:123C>T. I have found that the transcript annotated by VEP for a SNPs is different that the transcript_ID assigned by me previously by using my LOEUF-GRCh38.

My question is

Transcripts_ID change between versions?

Looking on Ensembl I found that

Ensembl stable gene, transcript, and protein identifiers are kept the same throughout Ensembl releases unless the gene or transcript model changes dramatically.

How often this can happen?

gencode ensembl • 765 views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 23 months ago by Manuel ▴ 40

1

Entering edit mode

I took the gene_IDs (e.g. ENSG00000010404) and I use Ensembl to get the coordinates of these genes in the GRCh38. I did the same with the transcripts. Then, I marge the dataset to have my own LOEUF dataset in GRCh38.

This should logically work but I'm not sure if it's better than a liftOver operation. Are you using a gnomAD dataset for LOEUF? If so, can you look at their v2 liftover callset if that has the information you need?

Can you provide an example where your LOUEF annotated with an ENSG/ENST ID different from that assigned by VEP?

ADD REPLY • link 23 months ago by Ram 43k

0

Entering edit mode

I am not sure what you mean for "gnomAD dataset for LOEUF?" I am using the 11_full_dataset provide in their paper

Where is the v2 lifover callset you mention?

This is an example

enter image description here

The transcript ID on the D column is taken from the dataset of LOEUF. The trancrits ID on the G coluimn is taken from VEP pluging hgvs

ADD REPLY • link 23 months ago by Manuel ▴ 40

0

Entering edit mode

gnomAD has various dataset versions: https://gnomad.broadinstitute.org/downloads

LoF curation is available under v2 but not under v2 liftover or v3. Their paper mentions gnomAD a bunch and the PI seems to be Daniel MacArthur, so I assumed the dataset would be available on gnomad's downloads.

I don't see anything inconsistent from VEP - I see your annotation gives you multiple Transcript IDs (D) per Gene (B), but VEP's HGVSg is maintained consistently with the Canonical transcript. I think you're annotating without any canonical transcript constraint in your custom loeuf approach and using the canonical transcript with VEP.

ADD REPLY • link 23 months ago by Ram 43k