For example, In the following record I dont understand some of the GT:AD:DP:GQ:PL information
chr1 897723 rs6696911 C T 453.42 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-0.479;DB;DP=36;Dels=0.00;FS=1.480;HRun=2;HaplotypeScore=0.0000;MQ=41.37;MQ0=0;MQRankSum=-0.578;QD=12.60;ReadPosRankSum=0.842
GT=1/1 I'm pretty sure both allele have T's. Whereas 1/0 would mean hetro for ref and snp?
AD = 19,17 - I cant find and explanation what AD means?
DP = 36 easy to understand
GQ = 87.16 Why are there two values in this field?
PL = 483,0,532 - I'm a bit baffled about this field?
For a biallelic site, the PL has three numbers, The first one is the probability that the site is homozgyous reference, the second is the probability that the sample is heterzygous, the third that it is homozygous for the alternate letter. The higher the number, the less likely it is that your sample is that genotype. So if your PL is 483,0,532 the software is quite sure that your sample is not homozygous reference or homozygous alternate, it's heterozygous. And the GT shows that, by being 0/1. If the first and last numbers had been lower, then the quality of the SNP woud be poorer, and the genotype would be less confident.
from a VCF file generated by GATK's UnifiedGenotyper:
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
this is not described on the VCF v4.1 format specs, although they do mention "Additional Genotype fields can be defined in the meta-information. However, software support for such fields is not guaranteed."