Question: What is Left align and parsimonious (VCF normalisation) ?
1
gravatar for pinninti1991reddy
2.4 years ago by
pinninti1991reddy60 wrote:

Hello

I'm very confused, can any one brief me what is left align and parsimonious mean with simple example ?

Thanks!

snp sequence next-gen genome • 1.4k views
ADD COMMENTlink modified 3 months ago by bjwiley230 • written 2.4 years ago by pinninti1991reddy60

In this blog where they review (maybe same person) they indicated for the example in this thread "green variant is not left aligned as you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles."

https://genome.sph.umich.edu/wiki/Variant_Normalization

What does they mean exactly with "you can prefix an A nucleotide on the left side of the variant's alleles and truncate the C on the right side of the variant's alleles"?

ADD REPLYlink modified 3 months ago • written 3 months ago by bjwiley230

This should be a new question, not an answer on an existing question.

The comments are made in reference to the image following those comments, where the REF is CAC and the ALT is C, showing a change going from CAC to C, essentially a CA deletion from the reference sequence. Because this deletion happens in a repeat region, the locus to delete should be the most 3' (left-most), which it is not in this case. An ACA > A change made one base to the left would have the same effect but be denoted more 3' than the current notation. Thus, the mutation cannot be denoted by fewer bases (it is most parsimonious) but can be denoted by something that is more 3' on the sequence (thus is not left-aligned).

The shown change is c.6CAC>C, whereas the most left-aligned would be c.5ACA>A. Both of these would cause one CA to be removed from the reference sequence.

ADD REPLYlink modified 3 months ago • written 3 months ago by RamRS27k

Ah ok so they essentially mean switching one position left on both the REF and ALT when they say add prefix of length 1 and then after that prefix of 1 is added then remove the suffix of length 1. Thanks. As far as being a new question, do I just add a comment if it is related to the question like mine? I wouldn't want to post this as a completely new thread correct?

ADD REPLYlink modified 3 months ago • written 3 months ago by bjwiley230

You could add a comment or open a new question and reference this post there. We want discussion, but not extensive offshoots. In your case, adding a comment would have been better as you only want clarification, and you don't really have a related question.

ADD REPLYlink written 3 months ago by RamRS27k
0
gravatar for dariober
2.4 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

This paper Unified representation of genetic variants or this wiki page from the same authors explain normalization very nicely.

ADD COMMENTlink written 2.4 years ago by dariober11k

I read it and performed my analysis. I'm not able to understand LEFT align concept ?

ADD REPLYlink written 2.4 years ago by pinninti1991reddy60
2

The paper is clear:

A VCF entry is left aligned if and only if its base position is smallest among all potential VCF entries having the same allele length and representing the same variant

In Fig 1, see the difference between A and D. Both variants are of the same length(2 base deletion) and would produce the same effect (of deleting the CA at position 4-5, to join the G at pos 3 to the C at pos 6). However A choose to pick the ALT's pos 6 as the variant's pos, whereas D opts to pick the left side base G at pos 3 as the variant's pos. D is accurate, as its pos does not change even after the mutation happens - it is left-aligned.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by RamRS27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour