Reality check: insertion v duplication
1
0
Entering edit mode
5.0 years ago
andrewl ▴ 10

Quick reality check - I have been normalizing VCFs and annotation files according to this methodology:

Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants.


An example implementation would be this here: https://github.com/ericminikel/minimal_representation/blob/master/normalize.py

A consequence of this is that all duplications get converted to insertions post normalization.

For example: ref: C, alt: CC would be normalized to ref: A, alt: AC (assuming A is the base pair preceding the ref position) or ref: CAC alt: CACCAC would be normalized to ref: G alt: GCAC (assuming G is the base pair preceding the ref position)

Does this make sense? Other than the label "insertion" v "duplication", should there be any importance given to the fact that these variations were duplications before the normalization, from a biological/clinical POV?

normalization DNA • 1.9k views
2
Entering edit mode
5.0 years ago

Duplication vs insertion distinction certainly has biological/clinical relevance, such as trinucleotide repeat expansion in Huntington's disease. Duplications are meta-stable and subject to copy number changes during replication, while non-duplicated insertions are not. And, depending upon the size and orientation, duplications are also prone to intra- and inter-molecular recombination, whereas non-duplicated insertions can actually suppress recombination.

0
Entering edit mode

Wow - thanks I was expecting that this was a silly question to answer, now glad I did.