Question: (Closed) Quality Control For Exome Sequencing
gravatar for Omid
7.8 years ago by
Omid560 wrote:

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should apply them(at the beginning or at the end)?

GC = GC content within 20 bp +/- the variant

HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call


MQ0Fraction = RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

SB = Strand Bias

BaseQualityRankSumTest = The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).

exome read quality • 3.5k views
ADD COMMENTlink modified 7.8 years ago by Johan870 • written 7.8 years ago by Omid560

Please stop duplicating posts - you have already asked this here:

and on SeqAnswers as well.

ADD REPLYlink written 7.8 years ago by Daniel Swan13k
gravatar for Sean Davis
7.8 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

I would suggest taking a look at variant quality score recalibration.

ADD COMMENTlink written 7.8 years ago by Sean Davis26k
gravatar for Johan
7.8 years ago by
Johan870 wrote:

To add to Sean Davis answer Broad has a nice description of their "best practice" for variant detection here: (I'm guessing that will want to do something like that) This approach is if particular interest if you are planing to use one of their variant callers, the UnifiedGenotyper, of the new HaplotypeCaller.

ADD COMMENTlink written 7.8 years ago by Johan870

Thanks Johan I have already found some threshold in that web site.But Unfortunately I could not find cut off for GC , HRun, MQ0, MQ 0 fraction,SB and BaseQualityRankSumTest.

ADD REPLYlink written 7.8 years ago by Omid560

I'm not sure that you need a hard threshold - the VQSR should create a model from the data. Quoting from the link above: "The tool used here is the Variant quality score recalibrator which builds an adaptive error model using known variant sites and then applies this model to estimate the probability that each variant in the callset is a true genetic variant or a machine/alignment artifact. All filtering criteria are learned from the data itself."

ADD REPLYlink written 7.8 years ago by Johan870

VQSR allows you to avoid having to make such arbitrary cutoffs by modeling errors in the data. There is no need (or should you desire) to define cutoffs based on single parameters.

ADD REPLYlink written 7.8 years ago by Sean Davis26k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour