Question: What is ExcessHet key in the INFO field of a vcf file
0
gravatar for kmkdesilva
8 months ago by
kmkdesilva80
United States
kmkdesilva80 wrote:

Hi,

Following is a single line from a vcf file I have (there are about 90 samples, so I omitted that part)

chr1    3463    .       C       T       59.40   .    AC=2;AF=0.143;AN=14;DP=13;ExcessHet=0.1703;FS=0.000;MLEAC=2;MLEAF=0.143;MQ=26.38;QD=29.70;SOR=2.303      GT:AD:DP:GQ:PGT:PID:PL
  1. I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.
  2. How can I use this key to filter out positions with excess heterozygotes (Is there a cut of value that I need to use for this)?
sequencing snp next-gen • 402 views
ADD COMMENTlink modified 8 months ago by Dave Carlson320 • written 8 months ago by kmkdesilva80

I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.

it should be defined in the VCF header.

ADD REPLYlink modified 8 months ago • written 8 months ago by Pierre Lindenbaum128k
1
gravatar for Dave Carlson
8 months ago by
Dave Carlson320
Stony Brook University, NY
Dave Carlson320 wrote:

Just checked a vcf file and the header indicates that the ExcessHet field is the

Phred-scaled p-value for exact test of excess heterozygosity

This VCF was created using GATK's HaplotypeCaller, so you can find more information about the ExcessHet statistic here.

Since the ExcessHet score is Phred-scaled, a low score (e.g. zero), would indicate a high p-value. while a large score would indicate a low p-value. This GATK forum post (where I got the above information) has some nice suggestions for options you could use when filtering on ExcessHet.

ADD COMMENTlink written 8 months ago by Dave Carlson320

Thank you. My final understanding is I need to look at the distribution of my ExcessHet values in the vcf file and then determine a z-score. Based on that z-score I can find the cut off ExcessHet value. Please let me know if I am wrong.

ADD REPLYlink modified 8 months ago • written 8 months ago by kmkdesilva80

Assuming for the moment that you do want to filter based on ExcessHet, then yes that sound about right to me.

Here is some more information from the Broad that might be helpful:

https://gatkforums.broadinstitute.org/gatk/discussion/23216/how-to-filter-variants-either-with-vqsr-or-by-hard-filtering. Notably

ExcessHet filtering applies only to callsets with a large number of samples, e.g. hundreds of unrelated samples. Small cohorts should not trigger ExcessHet filtering as values should remain small. Note cohorts of consanguinous samples will inflate ExcessHet, and it is possible to limit the annotation to founders for such cohorts by providing a pedigree file during variant calling.

ADD REPLYlink written 8 months ago by Dave Carlson320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2135 users visited in the last hour