Question

What is ExcessHet key in the INFO field of a vcf file

0

Entering edit mode

5.8 years ago

Kash ▴ 110

Hi,

Following is a single line from a vcf file I have (there are about 90 samples, so I omitted that part)

chr1    3463    .       C       T       59.40   .    AC=2;AF=0.143;AN=14;DP=13;ExcessHet=0.1703;FS=0.000;MLEAC=2;MLEAF=0.143;MQ=26.38;QD=29.70;SOR=2.303      GT:AD:DP:GQ:PGT:PID:PL

I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.
How can I use this key to filter out positions with excess heterozygotes (Is there a cut of value that I need to use for this)?

SNP next-gen sequencing • 4.7k views

ADD COMMENT • link updated 5.8 years ago by Dave Carlson ★ 2.1k • written 5.8 years ago by Kash ▴ 110

0

Entering edit mode

I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.

it should be defined in the VCF header.

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 166k

score 1 · Answer 1 · 2019-09-11

1

Entering edit mode

5.8 years ago

Dave Carlson ★ 2.1k

Just checked a vcf file and the header indicates that the ExcessHet field is the

Phred-scaled p-value for exact test of excess heterozygosity

This VCF was created using GATK's HaplotypeCaller, so you can find more information about the ExcessHet statistic here.

Since the ExcessHet score is Phred-scaled, a low score (e.g. zero), would indicate a high p-value. while a large score would indicate a low p-value. This GATK forum post (where I got the above information) has some nice suggestions for options you could use when filtering on ExcessHet.

ADD COMMENT • link 5.8 years ago by Dave Carlson ★ 2.1k

0

Entering edit mode

Thank you. My final understanding is I need to look at the distribution of my ExcessHet values in the vcf file and then determine a z-score. Based on that z-score I can find the cut off ExcessHet value. Please let me know if I am wrong.

ADD REPLY • link 5.8 years ago by Kash ▴ 110

0

Entering edit mode

Assuming for the moment that you do want to filter based on ExcessHet, then yes that sound about right to me.

Here is some more information from the Broad that might be helpful:

https://gatkforums.broadinstitute.org/gatk/discussion/23216/how-to-filter-variants-either-with-vqsr-or-by-hard-filtering. Notably

ExcessHet filtering applies only to callsets with a large number of samples, e.g. hundreds of unrelated samples. Small cohorts should not trigger ExcessHet filtering as values should remain small. Note cohorts of consanguinous samples will inflate ExcessHet, and it is possible to limit the annotation to founders for such cohorts by providing a pedigree file during variant calling.

ADD REPLY • link 5.8 years ago by Dave Carlson ★ 2.1k