Question

HGVS Nomenclature of Multiple Indels found in Cis

1

Entering edit mode

8 months ago

LauferVA 4.4k

I'd like to ask for guidance on how to approach the naming of somatic variation according to HGVS nomenclature in certain more complex scenarios. While this question may seem simple at first glance, the goal here is to have an approach with high enough accuracy that a high degree of automation can be achieved. This goal is, in turn, hampered by cases in which 1 and the same gene harbors multiple variants that may alter its function. This special case is problematic because it can complicate the further goal of correctly handling variant naming at both the DNA and protein level for variants after the first...

specifically, if the presence of the first variant impacts the effect of the second variant on the protein, then the standard assignment of the p. nomenclature might be misleading. In such a case, we would either need to know the phase between the two, or we would have to list multiple (speculative) options as to what the functional effect of the second variant is!

Now, HGVS publishes its own variant naming software, and additional 3rd party software is available. While I have not implemented functionality like this recently, priorly I recall that use of software like Mutalyzer, VariantValidator, Biocommons/hgvs, etc. further improved on HGVS's software... however this may no longer be true, or there may be more up-to-date/better tools out there now....

I do not know of any tool that handles naming perfectly for these more complex scenarios, like those described above ... at least, not one accurate enough effectively automate the process for many types of variants/variant pairs. Do others have accumulated experience in this area?

--- clarifying comment / example ---

Ram - you're right, I should clarify this. thank you for encouraging me to do that. within HGVS itself, the name of one variant does not change depending on the presence / absence of another, except with respect to allelic annotations. for instance, suppose a protein has F43* on one allele, and then later a variant that would make I270Y on the same allele. HGVS still calls this p. I270Y, even if the transcript in question has no 270th position because of the nonsense variant at 43.

However, this results in what is in effect an inaccurate designation with respect to biological truth (assuming transcript, phase, and varaints all correctly ascertained). Not surprisingly, this can prove problematic for downstream analyses. For example, suppose that one generatin neo-epitopes based on a VCF containing both of these. such a software, if it reads in the VCF naively, will generate a score for I270Y, even though this epitope does not exist due to the 5' nonsense variant.

thus, what im looking for is a software that names variants correctly according to HGVS except for when the nomenclature of HGVS refers to a state that does not reflect reality for a given transcript.

Thank you!!

Nomenclature HGVS • 1.3k views

ADD COMMENT • link updated 3 months ago by Ram 44k • written 8 months ago by LauferVA 4.4k

0

Entering edit mode

if the presence of the first variant impacts the effect of the second variant on the protein, then the standard assignment of the p. nomenclature might be misleading.

Can you give an example? I don't see how nomenclature and effects are related.

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

Thank you for the clarification-edit, LauferVA !

If I understand you correctly, the second variant is dependent on the first - so in effect, the reference protein (ENSP/NP_) that would precede the second variant would not exist as it is a variant/mutant form of an existing ENSP/NP_ - am I getting this right?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

I think so. What I mean is IF all the following conditions are met:

there is a truncating mutation
there is a second mutation 3' from the first
the transcript isoform of interest would normally include both variants
both variants can be shown to be on the same allele

THEN the second variant will not alter protein function (because the protein is truncated before that amino acid). Despite this, the HGVS name regards every variant in isolation. As a result, any tool that (correctly) names every variant with a HGVS name will incorrectly attribute a function to the second variant that, in reality, does not exist. right?

ADD REPLY • link 8 months ago by LauferVA 4.4k

0

Entering edit mode

I honestly think you're expecting too much of a variant nomenclature system. HGVS does consider each variant in isolation. You're right, it is biologically impossible to have both a truncating and a downstream non-truncating variant in the same isoform but there could be two different mutant molecules of the same isoform in the same cell - not probable but possible. Unless the change happens in tandem (such as an indel or MNV), one cannot expect HGVS to specify a name that maintains that relationship. It is for this reason that most annotation tools have an option that picks the most severe effect per transcript (or per gene).

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

im not necessarily placing any expectation on HGVS.

im asking for an additional tool that does that, given correctly named variants

ADD REPLY • link 8 months ago by LauferVA 4.4k

0

Entering edit mode

I don't think that tool exists - even the official python package only maps between reference sequence types (g to c, c to p etc.), it cannot apply a variant and return a mutant sequence.

You need a custom tool that retrieves the reference sequence, parses and applies a list of variants to that sequence and errors out when it's unable to apply it. The official module can help you validate the first variant you apply (since it will be applied to the unmodified reference sequence) but you'll need to validate any subsequent variants.

If you end up creating this tool, it would be extremely useful for all that need a mutant sequence from a reference sequence and a variant.

ADD REPLY • link 8 months ago by Ram 44k

1

Entering edit mode

We agree on what is needed!!

ADD REPLY • link 8 months ago by LauferVA 4.4k

0

Entering edit mode

What you want is something called consensus sequence. Basically to generate the sequence after consideration all variation. It will rely on variation phase to be able to reconstruct the sequence. Maybe you can read with that keyword and find tools that is suitable. I was trying to do what you want here but I was writing my own code.

ADD REPLY • link 6 months ago by bharata1803 ▴ 560

0

Entering edit mode

how has this code progressed? is it publically available? would you want to work together?

ADD REPLY • link 3 months ago by LauferVA 4.4k

0

Entering edit mode

They're not addressing the actual question, just saying something tangentially related. It looks like their understanding of both consensus sequences and your problem statement is incomplete, which is why I did not respond to them.

ADD REPLY • link 3 months ago by Ram 44k