Number of allelic depths is larger than number of inferred alleles in VCF file
1
0
Entering edit mode
12 months ago
Alexandros • 0

Hello,

I performed variant calling and genotyping of my samples using GATK 4.2.6.1. What I found in the VCF file is that despite having only 2 genotypes separated with "/" or "|", the number of allelic depths (ADs) is larger than the number of alleles (i.e., in the example below 3 ADs for 2 alleles or 4 ADs for 1 allele in the second example). What is the reason for this?

1|2:11,70,65:146:99:0|1:2857836_C_*:5031,2213,2478,2377,0,2643:2857836
3|3:0,0,0,4:4:13:1|1:1933943_TTAAGGTAG_T:139,139,139,139,139,139,13,13,13,0:1933943

I thought that in this case only the depths for alleles 1 and 2 (or for the allele 3 in the second example should be given). Do the rest of ADs correspond to the other alleles for example of the reference or of other samples?

cheers

VCF GATK Genotyping Variant-calling • 867 views
ADD COMMENT
0
Entering edit mode

show us the whole line of this VCF, at least CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

ADD REPLY
0
Entering edit mode
##source=CombineGVCFs

##source=SelectVariants

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Bc_ref_BXQ_D-illumina_reads Bc_ref_BXQ_E-illumina_reads

contig_17   2857838 .   C   *,A 2203.81 .   AC=1,1;AF=0.500,0.500;AN=2;BaseQRankSum=0.628;DP=397;ExcessHet=0.0000;FS=6.717;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.73;MQRankSum=3.53;QD=15.09;ReadPosRankSum=3.22;SOR=0.185   GT:AD:DP:GQ:PGT:PID:PL:PS   1|2:11,70,65:146:99:0|1:2857836_C_*:5031,2213,2478,2377,0,2643:2857836  ./.:232,0,0:232:0:.:.:0,0,0,0,0,0
ADD REPLY
0
Entering edit mode

And for the second genotype above

##source=CombineGVCFs

##source=SelectVariants

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Bc_ref_BXQ_D-illumina_reads Bc_ref_BXQ_E-illumina_reads

contig_9    1933948 .   G   T,C,*   234.21  .   AC=1,1,2;AF=0.250,0.250,0.500;AN=4;DP=17;ExcessHet=0.0000;FS=0.000;MLEAC=1,1,2;MLEAF=0.250,0.250,0.500;MQ=45.91;QD=23.42;SOR=3.258  GT:AD:DP:GQ:PGT:PID:PL:PS   1/2:0,2,4,0:6:72:.:.:252,168,162,84,0,72,252,168,84,252 3|3:0,0,0,4:4:13:1|1:1933943_TTAAGGTAG_T:139,139,139,139,139,139,13,13,13,0:1933943
ADD REPLY
0
Entering edit mode

I agree with @leipzig. For this position there are 4 alleles total: 1 reference (G), and 3 alternatives (T,C, *; where * is overlapping a deletion in GATK), so the AD annotation describes the number of unfiltered reads that support each allele in the order reported.

Here, in sample Bc_ref_BXQ_D-illumina_reads at position contig_9:1933948 you have 0 reads supporting G, 2 supporting T, 4 reads supporting C, and 0 reads supporting a deletion.

ADD REPLY
1
Entering edit mode
12 months ago

So * is the GATK-specific spanning or overlapping deletion. It's just another allele here, so if you have

G -> T / C / *

I would expect 4 AD's and

C -> * / A

I would expect 3 AD's

ADD COMMENT

Login before adding your answer.

Traffic: 2802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6