Question: How Do You Usually Filter Variant Calling Results?
2
gravatar for newDNASeqer
5.2 years ago by
newDNASeqer630
United States
newDNASeqer630 wrote:

I'm a novice to variant calling, and would like to get an idea how you guys usually filter the final variant calling results.

I am using GATK and annovar to do the variant calling and annotation. The annovar outputs include Polyphen2 and SIFT scores, etc. I am now using the following standards to filter variant calling results:

  1. Number of reads: at least 10 reads (both REF and ALT alleles) with at least 5 reads of mutant allele.
  2. Polyphen2 score of at least 0.4

I am not quite sure how good this filter is. I want to minimize the false positive while not losing too much real positive info. Could you guys shed some light on how to analyze the variants calling results? Your reply is appreciated.

filter • 3.8k views
ADD COMMENTlink modified 5.2 years ago by Alex Paciorkowski3.3k • written 5.2 years ago by newDNASeqer630

From what organism are your data?

ADD REPLYlink written 5.2 years ago by Sean Davis25k

Guessing human if @newDNASeqer is using Polyphen2 and SIFT. Though it does help here to be explicit.

ADD REPLYlink written 5.2 years ago by Chris Fields2.1k

For which organism? and what is the study system? I mean any looking for germline variants, somatic or any disease specific mutations?

ADD REPLYlink written 5.2 years ago by pirates.of.the.genome90
2
gravatar for Alex Paciorkowski
5.2 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

As with many questions you'll see posted here, the answer all depends upon your hypotheses and experimental design. There are many previous threads that address aspects of your question, and you might want to take a look at them:

Filtering Ngs Genomic Alignments

Variant Filtration By Exclusion Of Common Or Well-Known Variants

Filtering Vcf Variants Based On Sequencing Coverage

And last, although I'm only assuming you are working with human data and maybe you are working with whole exome data (don't know from your question), but this thread has a lot of information that may be helpful, plus a lot of links to other sites where there is more information: What Is The Best Pipeline For Human Whole Exome Sequencing?

Regarding your two specific points for filters:

Number of reads: at least 10 reads (both REF and ALT alleles) with at least 5 reads of mutant allele.
Polyphen2 score of at least 0.4

Assuming, again, you are working with human data, and assuming again this is a whole exome seq experiment, and assuming again (a lot of assumptions) your experimental design is to identify the variant(s) causing the phenotype you are studying, then those are reasonable filters, except if your causative variant has poor read depth you will filter it out. And remember SIFT, PolyPhen, et al only provide suggestions and guesses, and are not based on actual in vivo biology -- so I don't actually filter for those annotations straight off. We've all seen pathologic mutations that are predicted to be "benign", but because they happen to cause an amino acid substitution in a key turn in the protein's 3D structure, are pathologic. These points are true assuming you are looking for causative variants in a single gene -- but if that's not your experimental design can you please clarify?

ADD COMMENTlink written 5.2 years ago by Alex Paciorkowski3.3k
0
gravatar for Sean Davis
5.2 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Assuming you are looking at human data, you might want to look at Variant Quality Score Recalibration.

ADD COMMENTlink written 5.2 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour