Question

Reality check: insertion v duplication

0

Entering edit mode

6.9 years ago

andrewl ▴ 10

Quick reality check - I have been normalizing VCFs and annotation files according to this methodology:

Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants.

An example implementation would be this here: https://github.com/ericminikel/minimal_representation/blob/master/normalize.py

A consequence of this is that all duplications get converted to insertions post normalization.

For example: ref: C, alt: CC would be normalized to ref: A, alt: AC (assuming A is the base pair preceding the ref position) or ref: CAC alt: CACCAC would be normalized to ref: G alt: GCAC (assuming G is the base pair preceding the ref position)

Does this make sense? Other than the label "insertion" v "duplication", should there be any importance given to the fact that these variations were duplications before the normalization, from a biological/clinical POV?

normalization DNA • 2.6k views

ADD COMMENT • link updated 6.9 years ago by harold.smith.tarheel ★ 4.9k • written 6.9 years ago by andrewl ▴ 10

score 2 · Accepted Answer · 2017-06-13

2

Entering edit mode

6.9 years ago

harold.smith.tarheel ★ 4.9k

Duplication vs insertion distinction certainly has biological/clinical relevance, such as trinucleotide repeat expansion in Huntington's disease. Duplications are meta-stable and subject to copy number changes during replication, while non-duplicated insertions are not. And, depending upon the size and orientation, duplications are also prone to intra- and inter-molecular recombination, whereas non-duplicated insertions can actually suppress recombination.

ADD COMMENT • link 6.9 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

Wow - thanks I was expecting that this was a silly question to answer, now glad I did.

ADD REPLY • link 6.9 years ago by andrewl ▴ 10