Question

If the mane transcript isn't available should you use the canonical transcript?

0

Entering edit mode

8 weeks ago

amy__ ▴ 160

Hello,

I am looking at whether some genes on GnomAD contain homozygous LOF variants in some genes, what is the common practice if the mane transcript is not reported in GnomAD? Should I then be looking at whether these LOF variants occur in the canonical transcript?

Ideally I would keep to using the MANE transcripts but when these are not available what would you suggest?

E.G: TPRX1 https://gnomad.broadinstitute.org/gene/ENSG00000178928?dataset=gnomad_r3

Thanks! Amy

ensembl mane gnomad canonical • 572 views

ADD COMMENT • link 7 weeks ago by amy__ ▴ 160

0

Entering edit mode

No one in the world except for you can answer this question, because no one in the world except for you knows what your goal(s) are.

Therefore, we are not in a position to create a preferred order of transcripts to draw from, but you are.

Stated slightly differently, this is really a biology question and the only way to know the answer to it is to understand why you are interested in TPRX1 in the first place... which, again, we don't know but you (presumably) do.

ADD REPLY • link 8 weeks ago by LauferVA 4.2k

0

Entering edit mode

Hi @LauferVA,

I understand your response. I have a list of autosomal recessive genes which we have identified as containing homozygous or compound het variants in our cohort that are either rare in gnomad and have a predicted loss of function impact from ensembl vep (e.g frameshift, nonsense etc). I am trying to filter out genes which are tolerant to biallelic loss of function in gnomad (using V3.1.2) - by looking at whether the gene has these homozygous lof variants distributed in the gene and at a high incidence.

However, this gene does not have a mane transcript thus I assume to just use the canonical but I would like some guidance if possible. Obviously I know in certain rare diseases, transcripts other than the mane may be expressed.

I am not sure if that is enough detail to help you understand what I am trying to do.

Thanks! Amy

ADD REPLY • link 8 weeks ago by amy__ ▴ 160

score 2 · Accepted Answer · 2024-03-02

amy__ - im not sure if there is a definitive answer to this question, but I do think there are various places that you could start.

One resource I might recommend that you scour in detail is ClinGen. ClinGen curates estimates of the effects of variants that are found (for example) in Gnomad, and also in many cases will provide a transcript isoform depending on how you look.

ClinGen has a standard operating procedure for gene evaluation, that includes information on copy number variation and its pathogenicity. I don't know off the top of my head if they discuss, specifically, how transcript isoforms are selected per this SOP, but its possible that such a document contains a description of the algorithm ClinGen uses to do it...
ClinGen has a dosage sensitivity map, too, which could help you.
Even if it does not, ClinGen also has a Sequence Variant Interpretation Working Group and a dosage sensitivity curation working group. These both have published articles that might address this.

Even if all of these resources don't give you what you need, I'd still argue that the information you need is on that website, because if all else has failed you could even reach out to the chairs of one of those WGs as a last resort - they will very likely be able to guide you.

OK, now, the above links, together, each play a role in serving as the basis for ClinGen's Clinical Genome Resource. This is a resource of curation of gene and variant level data that causes human phenotypes. This page shows you all the variant information for all curated genes, and how each was classified. Because HGVS nomenclature requires a transcript ID, most of the variants in the resources like this should have an assigned transcript, at least I strongly suspect so.

Returning to the dosage sensitivity page again, look at the grey bar at the top:

Gene/Region GRCh37 HI Score TS Score OMIM Morbid %HI pLI LOEUF Last Eval.

I think you may find the %HI and pLI fields to be helpful - though to be honest, you might end up liking some of the other tools more, I don't know. Perhaps what you really needed is to know about OMIM, for example. pLI is the probability that loss of function variation in that gene is harmful or even not tolerated. I think this is pretty close to what you are after. But, even then the important part from your perspective (I think) is, what is the source of this and how are they making this estimate. The pLI index is derived from GnoMAD, and there is a paper about pLI - this is the kind of thing that you need to track down and read - it might be exactly what you need based on your post...

Other tools describe the probability estimates that a gene is haploinsufficient %HI or triplosensitive (pTriplo).

Finally, a word on usage:

The role of ClinGen is not to establish these metrics or explain their use - that is done in the manuscripts for each of the tools themselves. What ClinGen provides, though, is estimates for each gene based on curated evidence...For your case, I think it would probably make sense to start by finding appropriate tools using a resource like ClinGen (the exact links provided in this post are designed to help), then reading about the tools. Once you understand them and you think you have the right tools, THEN you could go to something like ClinGen's API and download the information.

But, even then amy__, keep in mind that not every gene is going to have info. in ClinGen for each of these metrics. Why? Well, because in many cases, we simply don't know.

Does that help?