Question

Questions concerning GATK

1

Entering edit mode

19 months ago

balaena ▴ 10

Hi everyone,

I wanted to ask some questions concerning medium-sized exome cohort studies with GATK and would be grateful for an answer:

g.vcf calling, GenomicsDBImport and subsequent genotypeGVCFs results in a vcf file that contains all samples. The SAMPLE field includes information about each of the individual samples (e.g. GT, GQ). How is the information obtained that can be found in the INFO field? Is it a result of the joint calling?
Concerning VQSR recalibration and the tranches: in the official documentation the VariantRecalibrator command includes "-tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 [...]", the resulting files are then used in the subsequent step ApplyVQSR. How do the tranches really work? Is it fine to just filter for the variants that are labelled as PASS?
De novo calling with a supplied .ped file and the PossibleDeNovo flag set, has been recommended on this forum. Is this GATK tool alone sufficient or would you go for other tools to obtain de novo mutations?
How is filtering for "real" de novo mutations in the joint cohort vcf usually performed? grep hiConfDeNovo gives you the variants where one person in the cohort exhibits a certain de novo mutation. With bcftools view, you can also output a vcf with only the affected trio included. Is there a tool to do this more accurately?
How would you do the filtering of inherited homozygous recessive or heterozygous dominant variants? I could easily parse it myself but as this is generally not recommended I would like to know how you would obtain these variants with an established tool? Or is it common to just do it with python?

I would really appreciate an answer!

WGS GATK WES • 278 views

ADD COMMENT • link 19 months ago by balaena ▴ 10