When to use .vcf or .gvcf files from GATK HaplotypeCaller?
1
0
Entering edit mode
23 months ago
Vitor1 ▴ 120

Hi everyone!

I'm curently following this tutorial here for variant calling using gatk: https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/

Its very clear and straightfoward, however it uses the HaplotypeCaller function from gatk to generate output in .vcf format (step 4).

When I was looking for GATK best practises for germile variante calling, it uses this same function (HaplotypeCaller) with the output beign in the .gvcf format, and later consolidating and getting the .vcf files. (https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows)

I'm wondering which one should I use. I have blood WES data from approximately 30 patients and I'm looking for SNPs and INDELS for specific genes.

Thanks!

indel gatk calling snp variant • 2.5k views
ADD COMMENT
2
Entering edit mode
23 months ago
Medhat 9.7k

"The key difference between a regular VCF and a GVCF is that the GVCF has records for all sites, whether there is a variant call there or not. The goal is to have every site represented in the file in order to do joint analysis of a cohort in subsequent steps. The records in a GVCF include an accurate estimation of how confident we are in the determination that the sites are homozygous-reference or not. This estimation is generated by the HaplotypeCaller's built-in reference model." More

So, in you case If you want to analyze the 30 sample as cohort use gvcf format. Additionally, you can convert gvcf to vcf, but not the other way bcftools convert --gvcf2vcf.

ADD COMMENT
0
Entering edit mode

Thanks mate!

ADD REPLY

Login before adding your answer.

Traffic: 2212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6