Questions concerning GATK
Entering edit mode
14 days ago
balaena ▴ 10

Hi everyone,

I wanted to ask some questions concerning medium-sized exome cohort studies with GATK and would be grateful for an answer:

  1. g.vcf calling, GenomicsDBImport and subsequent genotypeGVCFs results in a vcf file that contains all samples. The SAMPLE field includes information about each of the individual samples (e.g. GT, GQ). How is the information obtained that can be found in the INFO field? Is it a result of the joint calling?
  2. Concerning VQSR recalibration and the tranches: in the official documentation the VariantRecalibrator command includes "-tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 [...]", the resulting files are then used in the subsequent step ApplyVQSR. How do the tranches really work? Is it fine to just filter for the variants that are labelled as PASS?
  3. De novo calling with a supplied .ped file and the PossibleDeNovo flag set, has been recommended on this forum. Is this GATK tool alone sufficient or would you go for other tools to obtain de novo mutations?
  4. How is filtering for "real" de novo mutations in the joint cohort vcf usually performed? grep hiConfDeNovo gives you the variants where one person in the cohort exhibits a certain de novo mutation. With bcftools view, you can also output a vcf with only the affected trio included. Is there a tool to do this more accurately?
  5. How would you do the filtering of inherited homozygous recessive or heterozygous dominant variants? I could easily parse it myself but as this is generally not recommended I would like to know how you would obtain these variants with an established tool? Or is it common to just do it with python?

I would really appreciate an answer!

WGS GATK WES • 84 views

Login before adding your answer.

Traffic: 1760 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6