Question: How do I interpret alternative allele <CN2> for structural variants in 1000 genomes VCF?
1
gravatar for Mr. Dave
18 months ago by
Mr. Dave40
United States
Mr. Dave40 wrote:

I'm trying to interpret copy number as described in 1000 genome's phase3 integrated call set. Here are some relevant lines from the VCF header:

##fileformat=VCFv4.1
##contig=<ID=1,assembly=b37,length=249250621>
##ALT=<ID=CNV,Description="Copy Number Polymorphism">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INS:ME:ALU,Description="Insertion of ALU element">
##ALT=<ID=INS:ME:LINE1,Description="Insertion of LINE1 element">
##ALT=<ID=INS:ME:SVA,Description="Insertion of SVA element">
##ALT=<ID=INS:MT,Description="Nuclear Mitochondrial Insertion">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
##ALT=<ID=CN1,Description="Copy number allele: 1 copy">
##ALT=<ID=CN2,Description="Copy number allele: 2 copies">
##ALT=<ID=CN3,Description="Copy number allele: 3 copies">
##ALT=<ID=CN4,Description="Copy number allele: 4 copies">
{...}
##ALT=<ID=CN124,Description="Copy number allele: 124 copies">
##INFO=<ID=CS,Number=1,Type=String,Description="Source call set.">
##INFO=<ID=END,Number=1,Type=Integer,Description="End coordinate of this variant">
##INFO=<ID=MC,Number=.,Type=String,Description="Merged calls.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">

I have filtered the data to only variants with an SVTYPE INFO tag set to DUP, DEL or CNV.

  • Records with INFO/SVTYPE=DEL generally have ALT=<CN0>, but occasionally show ALT=<CN0>,<CN2>. In these cases, there are no calls for <CN2>.
  • Records with INFO/SVTYPE=DUP generally have ALT=<CN2>, but occasionally show ALT=<CN0>,<CN2>. In these cases, there are no calls for <CN0>.
  • Records with INFO/SVTYPE=CNV show a variety of combinations.

Here is a summary of the variants in the above file filtered down to the 3 SVTYPES, with accompanying totals:

#  N SVTYPE ALT
6026 DUP    <CN2>
 100 DUP    <CN0>,<CN2>
   3 DUP    <CN2>,<CN3>

33329 DEL   <CN0>
   7 DEL    <CN0>,<CN2>
   6 DEL    G
   3 DEL    A
   2 DEL    C
   2 DEL    T
   1 DEL    TGGTTCATTGATATTCTGCTGTGGCAC{..Truncated..},T

2716 CNV    <CN0>,<CN2>
 136 CNV    <CN2>,<CN3>
  90 CNV    <CN0>,<CN2>,<CN3>
  50 CNV    <CN2>
  35 CNV    <CN0>,<CN2>,<CN3>,<CN4>
  23 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>
  23 CNV    <CN2>,<CN3>,<CN4>
  12 CNV    <CN0>
   9 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>,<CN6>
   8 CNV    <CN2>,<CN3>,<CN4>,<CN5>
   4 CNV    <CN3>,<CN4>
   3 CNV    <CN0>,<CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>
   3 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>
   2 CNV    <CN0>,<CN1>,<CN3>,<CN4>
   2 CNV    <CN1>,<CN3>
   2 CNV    <CN1>,<CN3>,<CN4>,<CN5>
   1 CNV    <CN1>,<CN3>,<CN4>
   1 CNV    <CN1>,<CN3>,<CN4>,<CN5>,<CN6>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>,<CN8>
   1 CNV    <CN2>,<CN3>,<CN4>,<CN5>,<CN6>,<CN7>,<CN8>,<CN9>
   1 CNV    <CN3>

So, how do I interpret <CN2> when INFO/SVTYPE is DUP or CNV? Despite the header's description of <CN2>, it seems that it should describe a biallelic duplication when INFO/SVTYPE=DUP, this idea is makes sense in reading the article. Does the header only apply when INFO/SVTYPE=CNV?

INFO/SVTYPE=DEL examples (INFO truncated):

1    738570       esv3584979      G    <CN0>          100    PASS    
    AC=1;AF=0.000199681;AN=5008;CS=DEL_union;END=742020;NS=2504;SVTYPE=DEL;VT=SV
1    766600       esv3584980      G    <CN0>          100    PASS    
    AC=188;AF=0.0375399;AN=5008;CS=DEL_union;END=769112;NS=2504;SVTYPE=DEL;VT=SV
2    50182899     esv3590712;.    A    <CN0>,<CN2>    100    PASS    
    AC=3,0;AF=0.000599042,0;AN=5008;CS=DUP_uwash;END=50192857;NS=2504;SVTYPE=DEL;VT=SV
3    138606780    esv3597927;.    T    <CN0>,<CN2>    100    PASS    
    AC=1,0;AF=0.000199681,0;AN=5008;CS=DUP_gs;END=138620917;NS=2504;SVTYPE=DEL;VT=SV

INFO/SVTYPE=DUP examples:

1       668630      esv3584976      G    <CN2>          100    PASS    
    AC=64;AF=0.0127796;AN=5008;CS=DUP_delly;END=850204;NS=2504;SVTYPE=DUP;VT=SV
1       16013837    esv3585317      T    <CN2>          100    PASS    
    AC=11;AF=0.00219649;AN=5008;CS=DUP_delly;END=16080976;MC=DUP_uwash_chr1_16012226_16082907;SVTYPE=DUP;VT=SV
1       16037975    .;esv3585319    G    <CN0>,<CN2>    100    PASS    
    AC=0,11;AF=0,0.00219649;AN=5008;CS=DUP_gs;END=16071850;SVTYPE=DUP;VT=SV
1       153682976   esv3587592;.    G    <CN2>,<CN3>    100    PASS    
    AC=194,0;AF=0.038738,0;AN=5008;CS=DUP_gs;END=153696281;SVTYPE=DUP;VT=SV

INFO/SVYPE=CNV examples:

1    1609210      esv3585011;esv3585012              G       <CN0>,<CN2>          100    PASS    
    AC=17,26;AF=0.00339457,0.00519169;AN=5008;CS=DUP_gs;END=1615827;SVTYPE=CNV;VT=SV
1    143984622    esv3587386;esv3587387              A       <CN2>,<CN3>          100    PASS    
    AC=4791,41;AF=0.956669,0.0081869;AN=5008;CS=DUP_gs;END=144094733;NS=2504;SVTYPE=CNV;VT=SV
1    248619876    esv3589555;esv3589556;esv3589557   A       <CN0>,<CN2>,<CN3>    100    PASS    
    AC=19,859,2;AF=0.00379393,0.171526,0.000399361;AN=5008;CS=DUP_gs;END=248634579;SVTYPE=CNV;VT=SV
Y    28462363     CNV_Y_28462363_28740539            T       <CN2>                100    PASS    
    AC=5;AF=0.00408831;AN=1223;END=28740539;SVTYPE=CNV;VT=SV
cnv vcf • 964 views
ADD COMMENTlink modified 7 weeks ago by QVINTVS_FABIVS_MAXIMVS2.1k • written 18 months ago by Mr. Dave40
0
gravatar for QVINTVS_FABIVS_MAXIMVS
7 weeks ago by
USA SoCal
QVINTVS_FABIVS_MAXIMVS2.1k wrote:

It's like this

For CNVs the REF is <CN1>

Humans are diploid thus : 0/0 == CN1/CN1 or 2 copies


Say the ALT is <CN0>

  • 0/0 = CN1/CN1 = 2 copies

  • 0/1 = CN1/CN0 = 1 copy

  • 1/1 = CN0/CN0 = 0 copies


With multiallelic variants the order of the alleles is the same as the genotypes (in 1-base positions)

So when ALT is <CN0>,<CN2>; the possible genotypes are 0,1,2 corresponding to CN1,CN0,CN2

  • 0/1 = CN1/CN0 = 1 copy

  • 1/2 = CN0/CN2 = 2 copies

  • 2/2 = CN2/CN2 = 4 copies


So if the ALT is <CN0>,<CN2>,<CN3>,<CN4>

The genotypes are (0/Ref), 1, 2, 3, 4 in the same order as the alleles in ALT

1/3 = CN0/CN3 = 3 copies.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by QVINTVS_FABIVS_MAXIMVS2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 541 users visited in the last hour