Question: How to identify Denovo Mutations in the child compared with parents?
0
gravatar for deepue
4.0 years ago by
deepue110
Finland
deepue110 wrote:

Hi,

I am new to NGS analysis and have been following this pipeline recommended in many of the posts in the forum. 

I have 3 samples(1 child, 2 parents) and completed analysis till generation of VCF files. I couldn't understand clearly the VariantFiltration step from GATK documentation. Could someone please give more information on the same ?

I would like to find de novo mutations in the child, Is it a good idea to proceed for de novo mutations identification after annotation or before annotation? Please advise me on how to proceed with this ?

Thanks

 

snp next-gen exome sequencing • 2.7k views
ADD COMMENTlink modified 4.0 years ago by Len Trigg1.2k • written 4.0 years ago by deepue110
2
gravatar for iraun
4.0 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

Well, the filtering of the variant call is crucial step if you want to get the most accurate call. We have to deal with the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact. Summarizing, you perform the filtering in order to discard false positive (increase specificity) variants without loosing true positive variants (sensitivity). GATK offers two approaches to do the filtering:

  • VariantFiltration tool ---> Hard-filtering: Filter variants according to user defined criteria such as: depth (DP), quality (QUAL)...
  • VariantRecalibrator + ApplyRecalibration tools : The first program assigns a well-calibrated probability to each variant call in a call set. The second program applies model parameters calculated by VariantRecalibrator to each variant in input VCF files producing a recalibrated VCF file in which each variant is annotated with its VQSLOD value. You can read more here: http://gatkforums.broadinstitute.org/discussion/39/variant-quality-score-recalibration-vqsr.

The first approach is the one one recommended in the pipeline you're following. But in my opinion that pipeline is a bit out of date. But you can just try both, and see the results and choose. Furthermore, in that pipeline, the variant call is performed using UnifiedGenotyper tool, and now there is one more updated tool in GATK that is HaplotypeCaller: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php

Hope it helps.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by iraun3.5k

Thank you @airan for detailed answer. Could you also please suggest on the approach to follow for the identification of Denovo Mutations by comparing to the parents SNPs information. Thanks !

ADD REPLYlink written 4.0 years ago by deepue110
0
gravatar for Len Trigg
4.0 years ago by
Len Trigg1.2k
New Zealand
Len Trigg1.2k wrote:

If you are new to NGS I would suggest RTG Core (free for non-commercial use) which incorporates pedigree (trios/quads/multi-generation) directly into the variant calling, with automatic flagging of de novo candidates in the output VCF. The pipeline is very streamlined and includes all the steps that are usually separate stages of other pipelines:

  1. rtg map each sample (results are pre-sorted and have calibration information determined)
  2. rtg family (applies mapping calibration, identifies duplicates, calls variants, including realignment for haplotype calling, phased according to inheritance, and applies variant recalibration)
  3. rtg vcffilter (isolate calls from the trio flagged as de-novo, with any extra filtering you might want to apply)

 

 

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Len Trigg1.2k

Thank you @Len for suggesting a nice tool for the analysis. I have already 3 vcf files of the family and would like to complete the analysis including GATK. When I have samples from next family, I will use RTG core from starting after reading the documentation of RTG core. 

Could you please suggest if there are any similar functions available in GATK/other packages used so far for the rest of the task to be done ?

Thanks.

ADD REPLYlink written 4.0 years ago by deepue110

I am not really familiar with the details of GATK tools for this scenario, but another factor is that even with GATK you should ideally have performed calling on all three family members at the same time, as this both gives better quality calls as well as helping to ensure that variants are represented the same way in all three samples (due to the fact that particularly for complex variants involving indels or longer haplotypes, you can get alternative representations for what is actually the same variant). In the absence of this, you probably want to do something like:

  1. Run a normalization tool (e.g. vcflib vcfallelicprimitives) on each of the VCFs to help with the representation consistency issue
  2. Merge the three sample vcfs into one multisample VCF (e.g rtg vcfmerge)
  3. Optionally run the multisample VCF through a mendelian violation checker (e.g. rtg mendelian)
  4. Filter the resulting multi-sample vcf to select variants where both parents are REF and the child is HET (e.g. rtg vcffilter)
  5. Apply subsequent variant quality filtering (e.g. rtg vcffilter)

 

 

 

 

ADD REPLYlink written 4.0 years ago by Len Trigg1.2k

Thank you @Lenn for the suggestions on how to proceed further.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by deepue110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1026 users visited in the last hour