I also posted this question to the GATK forum, but it doesn't seem to be very active. The question is:
How do I get GATK to write the sample-level annotations to my VCF file? Most VCF files I've worked with have a header that includes FORMAT and then the names of each of my samples. Then each record ends with the annotations for each sample, like:
GT:AD:DP:GQ:PL 0/1:18,15:33:99:393,0,480 1/1:0,30:30:89:913,89,0 etc.
This is the kind of VFC file I get from our current pipeline, which uses Freebayes. But I'm trying to learn to use GATK. I managed to write some WDL scripts to run GATK HaplotypeCaller on 11 mosquito genomes and then run the resulting 11 GVCF files through GATK GenotypeGVCFs. It worked, but... There are no sample-level annotations! The VCF header stops at INFO.
So I have apparently done the "SNP discovery", but I'd also to call the genotypes of each sample. Is there a separate GATK tool to do that? What am I missing here? I looked through the documents for GenotypeGVCFs but couldn't see any command arguments that instruct it to output sample-level annotations.
Well, it was all scripted in WDL, basically the Broad Institute scripts but I stripped them down to the bare minimum. I can post the full scripts, or even the Cromwell workflow logs, but obviously they're quite long! The basic command line for HaplotypeCaller looked like this:
Each sample was sharded across 8 intervals (basically full chromosome arms) and the shards were merged with GATK MergeVCFs. Once I had produced 11 GVCF files that way, I ran a WDL script that used GATK GenomicsDBImport to pull them all into a database and then GenotypeGVCFs used that database as input. The basic GenomicsDBImport command looked like this:
Again, it was sharded, this time in 5 shards. Then the basic GenotypeVCFs command looked like this:
For the same 5 shards. Then there were some filtering commands on each shard that looked like this:
And finally, all the shards were gathered with this command:
what are your command lines ?
hum... CombineGVCFs is missing here.