Question: What is ExcessHet key in the INFO field of a vcf file
0
gravatar for kmkdesilva
10 days ago by
kmkdesilva80
United States
kmkdesilva80 wrote:

Hi,

Following is a single line from a vcf file I have (there are about 90 samples, so I omitted that part)

chr1    3463    .       C       T       59.40   .    AC=2;AF=0.143;AN=14;DP=13;ExcessHet=0.1703;FS=0.000;MLEAC=2;MLEAF=0.143;MQ=26.38;QD=29.70;SOR=2.303      GT:AD:DP:GQ:PGT:PID:PL
  1. I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.
  2. How can I use this key to filter out positions with excess heterozygotes (Is there a cut of value that I need to use for this)?
sequencing snp next-gen • 96 views
ADD COMMENTlink modified 10 days ago by Dave Carlson120 • written 10 days ago by kmkdesilva80

I would like to know what information is given by the ExcessHet key under INFO field in a VCF file.

it should be defined in the VCF header.

ADD REPLYlink modified 10 days ago • written 10 days ago by Pierre Lindenbaum123k
1
gravatar for Dave Carlson
10 days ago by
Dave Carlson120
Stony Brook University, NY
Dave Carlson120 wrote:

Just checked a vcf file and the header indicates that the ExcessHet field is the

Phred-scaled p-value for exact test of excess heterozygosity

This VCF was created using GATK's HaplotypeCaller, so you can find more information about the ExcessHet statistic here.

Since the ExcessHet score is Phred-scaled, a low score (e.g. zero), would indicate a high p-value. while a large score would indicate a low p-value. This GATK forum post (where I got the above information) has some nice suggestions for options you could use when filtering on ExcessHet.

ADD COMMENTlink written 10 days ago by Dave Carlson120

Thank you. My final understanding is I need to look at the distribution of my ExcessHet values in the vcf file and then determine a z-score. Based on that z-score I can find the cut off ExcessHet value. Please let me know if I am wrong.

ADD REPLYlink modified 9 days ago • written 9 days ago by kmkdesilva80

Assuming for the moment that you do want to filter based on ExcessHet, then yes that sound about right to me.

Here is some more information from the Broad that might be helpful:

https://gatkforums.broadinstitute.org/gatk/discussion/23216/how-to-filter-variants-either-with-vqsr-or-by-hard-filtering. Notably

ExcessHet filtering applies only to callsets with a large number of samples, e.g. hundreds of unrelated samples. Small cohorts should not trigger ExcessHet filtering as values should remain small. Note cohorts of consanguinous samples will inflate ExcessHet, and it is possible to limit the annotation to founders for such cohorts by providing a pedigree file during variant calling.

ADD REPLYlink written 9 days ago by Dave Carlson120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 856 users visited in the last hour