Question: VCF file PL values for more than 1 alternatives allele
gravatar for bharata1803
3.8 years ago by
bharata1803490 wrote:


In VCF file, there GT/PL folumn for genotype and its likelihood values. If 2 allele are possible (reference allele and alternative allele) the column value would be like below:


The 56 score is correspond to reference homozygous, 0 is to heterezygous, and 80 is to alternative homozygous.

My question is, if there are more than 2 allele (let's say 0 for reference and 1,2 for alternate allele), the score will consist of 6 score which is corresponds to:

  1. reference homozygous (0/0)
  2. alt 1 homozygous (1/1)
  3. alt 2 homozygous (2/2)
  4. ref and alt 1 heterzygous (0/1)
  5. ref and alt 2 heterozygous (1/2)
  6. alt 1 and alt 2 heterozygous (2/2)

My question is what is the order in the actual VCF file? I just don't know the order of the score and its corresponding meaning. Below is the actual example of 1 line in my vcf data.

1 226548932 . ACGGCGGCGGCGGCGGCGGCGGTGGCGGCGGCGG ACGGCGGCGGCGGTGGCGGCGGCGG,ACGGCGGCGGCGGCGGCGGTGGCGGCGGCGG 39.049 . INDEL;IDV=1;IMF=1;DP=9;VDB=0.0225004;SGB=-1.15236;MQSB=0.900802;MQ0F=0;ICB=0.153846;HOB=0.0555556;AC=1,1;AN=12;DP4=4,2,1,1;MQ=60 GT:PL ./.:0,0,0,0,0,0 0/0:0,3,60,3,60,60 0/0:0,3,60,3,60,60 ./.:0,0,0,0,0,0 0/1:60,3,0,60,3,60 0/0:0,3,60,3,60,60 0/0:0,3,60,3,60,60 0/2:50,56,132,0,81,78

Look at the GT/PL list below (I have 8 samples):

  1. Sample 1 : ./.:0,0,0,0,0,0
  2. Sample 2 : 0/0:0,3,60,3,60,60
  3. Sample 3 : 0/0:0,3,60,3,60,60
  4. Sample 4 : ./.:0,0,0,0,0,0
  5. Sample 5 : 0/1:60,3,0,60,3,60
  6. Sample 6 : 0/0:0,3,60,3,60,60
  7. Sample 7 : 0/0:0,3,60,3,60,60
  8. Sample 8 : 0/2:50,56,132,0,81,78

I add more interesting result:

  1. Sample 1: 1/1:26,12,9,26,12,26
  2. Sample 2: 0/1:0,3,5,3,5,5
  3. Sample 3: 1/1:26,12,9,26,12,26
  4. Sample 4: 1/2:45,45,45,6,6,0
  5. Sample 5: 1/1:20,3,0,20,3,20
  6. Sample 6: ./.:0,0,0,0,0,0
  7. Sample 7: ./.:0,0,0,0,0,0
  8. Sample 8: 1/1:26,12,9,26,12,26

So, if anyone knows how to interpret the score, please teach me and if it is possible, maybe you can explain the general consept. I treid reading the VCF documentation but it is not written there I think.

snp vcf indel • 5.8k views
ADD COMMENTlink modified 3.8 years ago by Santosh Anand5.2k • written 3.8 years ago by bharata1803490
gravatar for Santosh Anand
3.8 years ago by
Santosh Anand5.2k
Santosh Anand5.2k wrote:

My question is what is the order in the actual VCF file?

This info is present in VCF specification, not easy to find though. Section 1.4.2

PL : the phred-scaled genotype likelihoods rounded to the closest integer (and otherwise defined precisely as the GL field) (Integers)

GL : genotype likelihoods comprised of comma separated floating point log10-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields. In presence of the GT field the same ploidy is expected and the canonical order is used; without GT field, diploidy is assumed. If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc. For example: GT:GL 0/1:-323.03,-99.29,-802.53 (Floats)

So the order in PL is the same as GL, which follows AA,AB,BB,AC,BC,CC, for tri-allelic sites.

So, if anyone knows how to interpret the score, please teach me and if it is possible, maybe you can explain the general consept.

This concept is very well explained in following GATK document.

If you understand the Phred sclae, it should be easy to follow. In case of difficulty, let us know.

ADD COMMENTlink written 3.8 years ago by Santosh Anand5.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1195 users visited in the last hour