Question

Filtering Strategy In Exome Sequencing And Quality Control

5

Entering edit mode

11.8 years ago

Omid ▴ 580

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should applied them?

GC = GC content within 20 bp +/- the variant
FS = Phred-scaled p-value using Fisher's exact test to detect strand bias. If the reference‐carrying reads are balanced betweenforward and reverse strands then the alternate‐carrying reads should be as well
HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction
HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call
HaplotypeScore = Consistency of the site with at most two segregating haplotypes (Probability that the reads in a window around the variant can be explained by at most two haplytopes)
MQ0Fraction = RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious
MQRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read mapping qualities. If the alternate bases are more likely to be found on reads with lower MQ than reference bases then the site is likely mismapped
QD = Variant confidence/quality by depth
ReadPosRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias. If the alternate bases are biased towards the beginning or end of the reads then the site is likely a mapping artifact
SB = Strand Bias
BaseQualityRankSumTest = The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).

exome • 10k views

ADD COMMENT • link updated 11.8 years ago by User 59 13k • written 11.8 years ago by Omid ▴ 580

score 2 · Answer 1 · 2012-07-20

2

Entering edit mode

11.8 years ago

User 59 13k

The first place to look, assuming those fields come from GATK, would be the Broad's own guidelines on filtering exome data:

http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3

ADD COMMENT • link 11.8 years ago by User 59 13k