Question: Store Structural Variants Into Vcf
6
gravatar for Tomáš Beluský
5.3 years ago by
Brno
Tomáš Beluský90 wrote:

Hi, after Represent Precise Deletion In Vcf, I've got some more questions about other structural variants in VCF, so I try to put them all into this post.

  • Duplication

                         123         456
    reference genome -----[           ]-------------------------------------------
                         123         456              789         1122
    sample genome    -----[           ]----------------[           ]--------------
    

    Here is example of duplication, but I don't know how to interpret POS and END. Would END be 456 in this example? Or 1122? And what about POS?

    I think with breakends it will look like this (let's say duplication occurs on chromosome 1):

    #CHROM  POS ID REF ALT      QUAL FILTER INFO
    1       788 .  .   .[1:123[ .    .      SVTYPE=BND;EVENT=DUP0
    1       789 .  .   ]1:456]. .    .      SVTYPE=BND;EVENT=DUP0
    

    But I also want to know how to use simpler way.

  • Translocation

                         123         456
    reference genome -----[           ]-------------------------------------------
                                                      789         1122
    sample genome    ----------------------------------[           ]--------------
    

    I think I can use entry about deletion and same entries like above for duplication:

    #CHROM  POS ID REF ALT      QUAL FILTER INFO
    1       123 .  .   .<DEL>   .    .      SVTYPE=DEL;END=456;SVLEN=-333;EVENT=TRANS0
    1       788 .  .   .[1:123[ .    .      SVTYPE=BND;EVENT=TRANS0
    1       789 .  .   ]1:456]. .    .      SVTYPE=BND;EVENT=TRANS0
    

    But there is maybe another way how to store this.

  • Insertion

    What if I don't know precise sequence of insertion? I know that I have to type <INS> into ALT column, but what about this sequence? What first come to my mind is to create new meta information, something like this:

    ##INFO=<ID=ISEQ,Number=1,Type=String,Description=“Imprecise inserted sequence”>
    

    Then I can store it into INFO column and maybe create another meta informations which describe confidence about begin and end of this sequence:

    ##INFO=<ID=CINSBEGIN,Number=1,Type=Integer,Description=“Confidence begin of inserted sequence”>
    ##INFO=<ID=CINSEND,Number=1,Type=Integer,Description=“Confidence end of inserted sequence”>
    

    Example:

    #CHROM  POS ID REF ALT      QUAL FILTER INFO
    1       123 .  .   .<INS>   .    .      SVTYPE=INS;END=123;ISEQ=ATTCGATCA;CINSBEGIN=2;CINSEND=1
    

    I can interpret it like insertion of these possible sequences: ATTCGATCA, TTCGATCA, TCGATCA, ATTCGATC, TTCGATC, TCGATC. So I am sure about insertion of sequence TCGATC, but there could be possible prefixes (A, AT) and sufixes (A). I hope I made it clear.

Thanks for all your help.

vcf • 2.9k views
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Tomáš Beluský90
1

This is not a real answer, but you might find this blog post quite useful.
http://core-genomics.blogspot.com/2011/07/understanding-mutation-nomenclature.html

ADD REPLYlink written 5.3 years ago by PoGibas4.7k

the terminology and definitions are unexpectedly complicated, it is quite surprising how many corner cases and ambiguities exist

ADD REPLYlink written 5.3 years ago by Istvan Albert ♦♦ 77k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour