How to identify Denovo Mutations in the child compared with parents?
2
0
Entering edit mode
9.0 years ago
deepue ▴ 160

Hi,

I am new to NGS analysis and have been following this pipeline recommended in many of the posts in the forum.

I have 3 samples(1 child, 2 parents) and completed analysis till generation of VCF files. I couldn't understand clearly the VariantFiltration step from GATK documentation. Could someone please give more information on the same?

I would like to find de novo mutations in the child, Is it a good idea to proceed for de novo mutations identification after annotation or before annotation? Please advise me on how to proceed with this?

Thanks

next-gen exome-sequencing SNP • 4.1k views
ADD COMMENT
2
Entering edit mode
9.0 years ago
iraun 6.2k

Well, the filtering of the variant call is crucial step if you want to get the most accurate call. We have to deal with the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact. Summarizing, you perform the filtering in order to discard false positive (increase specificity) variants without loosing true positive variants (sensitivity). GATK offers two approaches to do the filtering:

  • VariantFiltration tool ---> Hard-filtering: Filter variants according to user defined criteria such as: depth (DP), quality (QUAL)...
  • VariantRecalibrator + ApplyRecalibration tools : The first program assigns a well-calibrated probability to each variant call in a call set. The second program applies model parameters calculated by VariantRecalibrator to each variant in input VCF files producing a recalibrated VCF file in which each variant is annotated with its VQSLOD value. You can read more here: http://gatkforums.broadinstitute.org/discussion/39/variant-quality-score-recalibration-vqsr.

The first approach is the one one recommended in the pipeline you're following. But in my opinion that pipeline is a bit out of date. But you can just try both, and see the results and choose. Furthermore, in that pipeline, the variant call is performed using UnifiedGenotyper tool, and now there is one more updated tool in GATK that is HaplotypeCaller: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php

Hope it helps.

ADD COMMENT
0
Entering edit mode

Thank you @airan for detailed answer. Could you also please suggest on the approach to follow for the identification of Denovo Mutations by comparing to the parents SNPs information. Thanks !

ADD REPLY
0
Entering edit mode
9.0 years ago
Len Trigg ★ 1.6k

If you are new to NGS I would suggest RTG Core (free for non-commercial use) which incorporates pedigree (trios/quads/multi-generation) directly into the variant calling, with automatic flagging of de novo candidates in the output VCF. The pipeline is very streamlined and includes all the steps that are usually separate stages of other pipelines:

  1. rtg map each sample (results are pre-sorted and have calibration information determined)
  2. rtg family (applies mapping calibration, identifies duplicates, calls variants, including realignment for haplotype calling, phased according to inheritance, and applies variant recalibration)
  3. rtg vcffilter (isolate calls from the trio flagged as de-novo, with any extra filtering you might want to apply)
ADD COMMENT
0
Entering edit mode

Thank you @Len for suggesting a nice tool for the analysis. I have already 3 vcf files of the family and would like to complete the analysis including GATK. When I have samples from next family, I will use RTG core from starting after reading the documentation of RTG core.

Could you please suggest if there are any similar functions available in GATK/other packages used so far for the rest of the task to be done?

Thanks

ADD REPLY
0
Entering edit mode

I am not really familiar with the details of GATK tools for this scenario, but another factor is that even with GATK you should ideally have performed calling on all three family members at the same time, as this both gives better quality calls as well as helping to ensure that variants are represented the same way in all three samples (due to the fact that particularly for complex variants involving indels or longer haplotypes, you can get alternative representations for what is actually the same variant). In the absence of this, you probably want to do something like:

  1. Run a normalization tool (e.g. vcflib vcfallelicprimitives) on each of the VCFs to help with the representation consistency issue
  2. Merge the three sample vcfs into one multisample VCF (e.g rtg vcfmerge)
  3. Optionally run the multisample VCF through a mendelian violation checker (e.g. rtg mendelian)
  4. Filter the resulting multi-sample vcf to select variants where both parents are REF and the child is HET (e.g. rtg vcffilter)
  5. Apply subsequent variant quality filtering (e.g. rtg vcffilter)
ADD REPLY
0
Entering edit mode

Thank you @Lenn for the suggestions on how to proceed further.

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6