I ran GATK4 on a single WES BAM file from The Cancer Genome Atlas (TCGA) program.
This is how an example SNP call looks like:
chr22 10685693 . A G,<NON_REF> 307.60 . BaseQRankSum=1.267;DP=52;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-4.582;RAW_MQandDP=91525,52;ReadPosRankSum=0.279 GT:AD:DP:GQ:PL:SB 0/1:34,18,0:52:99:315,0,697,414,751,1165:18,16,1,17
As you can see the PL field features six comma-separated integers: 315,0,697,414,751,1165. My understanding derived from the answer here is that this should correspond to the following (phred-adjusted) genotype likelihoods (N corresponds to NON_REF):
AA: 315 AG: 0 GG: 697 AN: 414 GN: 751 NN: 1165
It appears we have overwhelming evidence of a heterozygous SNP A/G: 34 calls for A, 18 calls for G and 0 calls for anything else. This data makes sense if you follow the formula for computing PL as shown here.
Briefly, probability of genotype given data is going to be very high (almost 1) for AG, and $-10* log(1) = 0$. Conversely it will be very low for CC, CT or TT (NON_REF), in the order of $10^-100$, and $-10 * log(10^-100) ~ 1165$. Given allelic imbalance of A vs G, it also makes sense that AA is more likely than GG.
As pointed out in the comments, gVCF file shouldn't be used in downstream analysis, one can combine (multiple) gVCF files(s) into finished genotypes using the
GenotypeGVCFs command of
gatk. This gets rid of NON_REF SNPs, with the result, which looks like this:
chr22 10685693 . A G 307.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.27;DP=52;ExcessHet=3.0103;FS=31.510;MLEAC=1;MLEAF=0.500;MQ=41.95;MQRankSum=-4.582e+00;QD=5.92;ReadPosRankSum=0.279;SOR=4.04 GT:AD:DP:GQ:PL 0/1:34,18:52:99:315,0,697
With the PL field corresponding to:
AA: 315 AG: 0 GG: 697