Question: How do I interpret alternative allele <CN2> for structural variants in 1000 genomes VCF?
0
gravatar for Mr. Dave
9 months ago by
Mr. Dave20
United States
Mr. Dave20 wrote:

I'm trying to interpret copy number as described in 1000 genome's phase3 integrated call set. Here are some relevant lines from the VCF header:

##fileformat=VCFv4.1
##contig=<ID=1,assembly=b37,length=249250621>
##ALT=<ID=CNV,Description="Copy Number Polymorphism">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INS:ME:ALU,Description="Insertion of ALU element">
##ALT=<ID=INS:ME:LINE1,Description="Insertion of LINE1 element">
##ALT=<ID=INS:ME:SVA,Description="Insertion of SVA element">
##ALT=<ID=INS:MT,Description="Nuclear Mitochondrial Insertion">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
##ALT=<ID=CN1,Description="Copy number allele: 1 copy">
##ALT=<ID=CN2,Description="Copy number allele: 2 copies">
##ALT=<ID=CN3,Description="Copy number allele: 3 copies">
##ALT=<ID=CN4,Description="Copy number allele: 4 copies">
{...}
##ALT=<ID=CN124,Description="Copy number allele: 124 copies">
##INFO=<ID=CS,Number=1,Type=String,Description="Source call set.">
##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate of this variant">
##INFO=<ID=MC,Number=.,Type=String,Description="Merged calls.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">

I have filtered the data to only variants with an SVTYPE INFO tag set to DUP, DEL or CNV.

  • Records with INFO/SVTYPE=DEL generally have ALT=<CN0>, but occasionally show ALT=<CN0>,<CN2>. In these cases, there are no calls for <CN2>.
  • Records with INFO/SVTYPE=DUP generally have ALT=<CN2>, but occasionally show ALT=<CN0>,<CN2>. In these cases, there are no calls for <CN0>.
  • Records with INFO/SVTYPE=CNV show a variety of combinations.

Here is a summary of the variants in the above file filtered down to the 3 SVTYPES, with accompanying totals:

#  N SVTYPE ALT
6026 DUP    <CN2>
 100 DUP    <CN0>,<CN2>
   3 DUP    <CN2>,<CN3>

33329 DEL   <CN0>
   7 DEL    <CN0>,<CN2>
   6 DEL    G
   3 DEL    A
   2 DEL    C
   2 DEL    T
   1 DEL    TGGTTCATTGATATTCTGCTGTGGCAC{..Truncated..},T

2716 CNV    <CN0>,<CN2>
 136 CNV    <CN2>,<CN3>
  90 CNV    <CN0>,<CN2>,<CN3>
  50 CNV    <CN2>
  35 CNV    <CN0>,<CN2>,<CN3>,<CN4>
  23 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>
  23 CNV    <CN2>,<CN3>,<CN4>
  12 CNV    <CN0>
   9 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>,<CN6>
   8 CNV    <CN2>,<CN3>,<CN4>,<CN5>
   4 CNV    <CN3>,<CN4>
   3 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>
   3 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>
   2 CNV    <CN0>,<CN1>,<CN3>,<CN4>
   2 CNV    <CN1>,<CN3>
   2 CNV    <CN1>,<CN3>,<CN4>,<CN5>
   1 CNV    <CN1>,<CN3>,<CN4>
   1 CNV    <CN1>,<CN3>,<CN4>,<CN5>,<CN6>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>,<CN8>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>,<CN8>,<CN9>
   1 CNV    <CN3>

So, how do I interpret <CN2> when INFO/SVTYPE is DUP or CNV? Despite the header's description of <CN2>, it seems that it should describe a biallelic duplication when INFO/SVTYPE=DUP, this idea is makes sense in reading the article. Does the header only apply when INFO/SVTYPE=CNV?

INFO/SVTYPE=DEL examples (INFO truncated):

1    738570       esv3584979      G    <CN0>          100    PASS    
    AC=1;AF=0.000199681;AN=5008;CS=DEL_union;END=742020;NS=2504;SVTYPE=DEL;VT=SV
1    766600       esv3584980      G    <CN0>          100    PASS    
    AC=188;AF=0.0375399;AN=5008;CS=DEL_union;END=769112;NS=2504;SVTYPE=DEL;VT=SV
2    50182899     esv3590712;.    A    <CN0>,<CN2>    100    PASS    
    AC=3,0;AF=0.000599042,0;AN=5008;CS=DUP_uwash;END=50192857;NS=2504;SVTYPE=DEL;VT=SV
3    138606780    esv3597927;.    T    <CN0>,<CN2>    100    PASS    
    AC=1,0;AF=0.000199681,0;AN=5008;CS=DUP_gs;END=138620917;NS=2504;SVTYPE=DEL;VT=SV

INFO/SVTYPE=DUP examples:

1       668630      esv3584976      G    <CN2>          100    PASS    
    AC=64;AF=0.0127796;AN=5008;CS=DUP_delly;END=850204;NS=2504;SVTYPE=DUP;VT=SV
1       16013837    esv3585317      T    <CN2>          100    PASS    
    AC=11;AF=0.00219649;AN=5008;CS=DUP_delly;END=16080976;MC=DUP_uwash_chr1_16012226_16082907;SVTYPE=DUP;VT=SV
1       16037975    .;esv3585319    G    <CN0>,<CN2>    100    PASS    
    AC=0,11;AF=0,0.00219649;AN=5008;CS=DUP_gs;END=16071850;SVTYPE=DUP;VT=SV
1       153682976   esv3587592;.    G    <CN2>,<CN3>    100    PASS    
    AC=194,0;AF=0.038738,0;AN=5008;CS=DUP_gs;END=153696281;SVTYPE=DUP;VT=SV

INFO/SVYPE=CNV examples:

1    1609210      esv3585011;esv3585012              G       <CN0>,<CN2>          100    PASS    
    AC=17,26;AF=0.00339457,0.00519169;AN=5008;CS=DUP_gs;END=1615827;SVTYPE=CNV;VT=SV
1    143984622    esv3587386;esv3587387              A       <CN2>,<CN3>          100    PASS    
    AC=4791,41;AF=0.956669,0.0081869;AN=5008;CS=DUP_gs;END=144094733;NS=2504;SVTYPE=CNV;VT=SV
1    248619876    esv3589555;esv3589556;esv3589557   A       <CN0>,<CN2>,<CN3>    100    PASS    
    AC=19,859,2;AF=0.00379393,0.171526,0.000399361;AN=5008;CS=DUP_gs;END=248634579;SVTYPE=CNV;VT=SV
Y    28462363     CNV_Y_28462363_28740539            T       <CN2>                100    PASS    
    AC=5;AF=0.00408831;AN=1223;END=28740539;SVTYPE=CNV;VT=SV
cnv vcf • 464 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by Mr. Dave20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1314 users visited in the last hour