Question: Filtering Strategy In Exome Sequencing And Quality Control
5
gravatar for Omid
6.8 years ago by
Omid540
Netherlands
Omid540 wrote:

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should applied them?

  1. GC = GC content within 20 bp +/- the variant

  2. FS = Phred-scaled p-value using Fisher's exact test to detect strand bias. If the reference‐carrying reads are balanced betweenforward and reverse strands then the alternate‐carrying reads should be as well

  3. HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

  4. HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call

  5. HaplotypeScore = Consistency of the site with at most two segregating haplotypes (Probability that the reads in a window around the variant can be explained by at most two haplytopes)

  6. MQ0Fraction = RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

  7. MQRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read mapping qualities. If the alternate bases are more likely to be found on reads with lower MQ than reference bases then the site is likely mismapped

  8. QD = Variant confidence/quality by depth

  9. ReadPosRankSum = Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias. If the alternate bases are biased towards the beginning or end of the reads then the site is likely a mapping artifact

  10. SB = Strand Bias

  11. BaseQualityRankSumTest = The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).

exome • 8.1k views
ADD COMMENTlink modified 6.8 years ago by Daniel Swan13k • written 6.8 years ago by Omid540
2
gravatar for Daniel Swan
6.8 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

The first place to look, assuming those fields come from GATK, would be the Broad's own guidelines on filtering exome data:

http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3

ADD COMMENTlink written 6.8 years ago by Daniel Swan13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1110 users visited in the last hour