Quick reality check - I have been normalizing VCFs and annotation files according to this methodology:

An example implementation would be this here:

A consequence of this is that all duplications get converted to insertions post normalization.

For example: ref: C, alt: CC would be normalized to ref: A, alt: AC (assuming A is the base pair preceding the ref position) or ref: CAC alt: CACCAC would be normalized to ref: G alt: GCAC (assuming G is the base pair preceding the ref position)

Does this make sense? Other than the label "insertion" v "duplication", should there be any importance given to the fact that these variations were duplications before the normalization, from a biological/clinical POV?

Duplication vs insertion distinction certainly has biological/clinical relevance, such as trinucleotide repeat expansion in Huntington's disease. Duplications are meta-stable and subject to copy number changes during replication, while non-duplicated insertions are not. And, depending upon the size and orientation, duplications are also prone to intra- and inter-molecular recombination, whereas non-duplicated insertions can actually suppress recombination.

Wow - thanks I was expecting that this was a silly question to answer, now glad I did.


