Question: Understanding Vcf File Format
8
gravatar for Tryingtogetthere
6.7 years ago by
Tryingtogetthere150 wrote:

let me start by saying I've spent hours at the http://vcftools.sourceforge.net/specs.html but I still don't understand some of the vcf fields.

For example, In the following record I dont understand some of the GT:AD:DP:GQ:PL information chr1 897723 rs6696911 C T 453.42 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-0.479;DB;DP=36;Dels=0.00;FS=1.480;HRun=2;HaplotypeScore=0.0000;MQ=41.37;MQ0=0;MQRankSum=-0.578;QD=12.60;ReadPosRankSum=0.842
GT:AD:DP:GQ:PL 0/1:19,17:36:87.16:483,0,532

GT=1/1 I'm pretty sure both allele have T's. Whereas 1/0 would mean hetro for ref and snp? AD = 19,17 - I cant find and explanation what AD means? DP = 36 easy to understand GQ = 87.16 Why are there two values in this field? PL = 483,0,532 - I'm a bit baffled about this field?

thanks for your help, Trying to get there

vcf format • 22k views
ADD COMMENTlink modified 6.4 years ago by Jorge Amigo11k • written 6.7 years ago by Tryingtogetthere150

Thanks for the link but I have been there also. That for the AD definition. I understand that PL is phred-scaled genotype likelihood but why are there three values? Thanks

ADD REPLYlink written 6.7 years ago by Tryingtogetthere150

Also why are the two genotype quality values thanks

ADD REPLYlink written 6.7 years ago by Tryingtogetthere150

There is only one GQ value. It is 87.16 (87 and 16/100)

ADD REPLYlink written 6.7 years ago by Rlong340
15
gravatar for Swbarnes2
6.7 years ago by
Swbarnes21.4k
Swbarnes21.4k wrote:

For a biallelic site, the PL has three numbers, The first one is the probability that the site is homozgyous reference, the second is the probability that the sample is heterzygous, the third that it is homozygous for the alternate letter. The higher the number, the less likely it is that your sample is that genotype. So if your PL is 483,0,532 the software is quite sure that your sample is not homozygous reference or homozygous alternate, it's heterozygous. And the GT shows that, by being 0/1. If the first and last numbers had been lower, then the quality of the SNP woud be poorer, and the genotype would be less confident.

ADD COMMENTlink written 6.7 years ago by Swbarnes21.4k
2

I think that means you have 19 reads showing the reference allele, and 17 reads showing the alterante allele. Those do add up to 36, which is your total depth.

ADD REPLYlink written 6.4 years ago by Swbarnes21.4k

Why does the AD have two values
GT:0/1 AD:19,17 DP:36 GQ:87.16 PL483,0,532

ADD REPLYlink written 6.4 years ago by jvijai1.1k
2
gravatar for Jorge Amigo
6.4 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

from a VCF file generated by GATK's UnifiedGenotyper:

##fileformat=VCFv4.1
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

this is not described on the VCF v4.1 format specs, although they do mention "Additional Genotype fields can be defined in the meta-information. However, software support for such fields is not guaranteed."

ADD COMMENTlink written 6.4 years ago by Jorge Amigo11k
0
gravatar for Rlong
6.7 years ago by
Rlong340
US
Rlong340 wrote:

For another reference, try the 1000 genomes site. The AD stands for allele depth, GQ is genotype quality, and that is one float value. PL is the phred-scaled genotype likelihood.

ADD COMMENTlink written 6.7 years ago by Rlong340
0
gravatar for Nick H
6.4 years ago by
Nick H30
Nick H30 wrote:

AD = Allelic Depth, which is the number of reads that have the reference vs non reference base. In this case 19 ref, 17 alternate.

These two values will usually, but not always sum to the DP value. Reads that are not used for calling are not counted in the DP measure, but are included in AD.

ADD COMMENTlink written 6.4 years ago by Nick H30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 820 users visited in the last hour