What is ExcessHet key in the INFO field of a vcf file
2.2 years ago
kmkdesilva ▴ 90

Hi,

Following is a single line from a vcf file I have (there are about 90 samples, so I omitted that part)

chr1    3463    .       C       T       59.40   .    AC=2;AF=0.143;AN=14;DP=13;ExcessHet=0.1703;FS=0.000;MLEAC=2;MLEAF=0.143;MQ=26.38;QD=29.70;SOR=2.303      GT:AD:DP:GQ:PGT:PID:PL

1. I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.
2. How can I use this key to filter out positions with excess heterozygotes (Is there a cut of value that I need to use for this)?
I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.

it should be defined in the VCF header.

2.2 years ago
Dave Carlson ▴ 640

Just checked a vcf file and the header indicates that the ExcessHet field is the

Phred-scaled p-value for exact test of excess heterozygosity

This VCF was created using GATK's HaplotypeCaller, so you can find more information about the ExcessHet statistic here.

Since the ExcessHet score is Phred-scaled, a low score (e.g. zero), would indicate a high p-value. while a large score would indicate a low p-value. This GATK forum post (where I got the above information) has some nice suggestions for options you could use when filtering on ExcessHet.

Thank you. My final understanding is I need to look at the distribution of my ExcessHet values in the vcf file and then determine a z-score. Based on that z-score I can find the cut off ExcessHet value. Please let me know if I am wrong.

Assuming for the moment that you do want to filter based on ExcessHet, then yes that sound about right to me.

ExcessHet filtering applies only to callsets with a large number of samples, e.g. hundreds of unrelated samples. Small cohorts should not trigger ExcessHet filtering as values should remain small. Note cohorts of consanguinous samples will inflate ExcessHet, and it is possible to limit the annotation to founders for such cohorts by providing a pedigree file during variant calling.