Filtering Strategy In Exome Sequencing And Quality Control
1
5
Entering edit mode
11.8 years ago
Omid ▴ 580

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should applied them?

  1. GC = GC content within 20 bp +/- the variant

  2. FS = Phred-scaled p-value using Fisher's exact test to detect strand bias. If the reference‐carrying reads are balanced betweenforward and reverse strands then the alternate‐carrying reads should be as well

  3. HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

  4. HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call

  5. HaplotypeScore = Consistency of the site with at most two segregating haplotypes (Probability that the reads in a window around the variant can be explained by at most two haplytopes)

  6. MQ0Fraction = RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

  7. MQRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read mapping qualities. If the alternate bases are more likely to be found on reads with lower MQ than reference bases then the site is likely mismapped

  8. QD = Variant confidence/quality by depth

  9. ReadPosRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias. If the alternate bases are biased towards the beginning or end of the reads then the site is likely a mapping artifact

  10. SB = Strand Bias

  11. BaseQualityRankSumTest = The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).

exome • 10k views
ADD COMMENT
2
Entering edit mode
11.8 years ago
User 59 13k

The first place to look, assuming those fields come from GATK, would be the Broad's own guidelines on filtering exome data:

http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3

ADD COMMENT

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6