I want to obtain a VCF file containing genotype calls and their scores for every rsID, whether or not a variant was called. I was planning to use the following steps:
- HaplotypeCaller -genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --emitRefConfidence BP_RESOLUTION as shown above
- awk '{ if ( $3 != "." ) { print $0; } }' variants.vcf > variants.filtered.vcf
- GenotypeGVCFs --includeNonVariantSites
Using the most recent dbSNP download here: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/All.vcf, I ran this:
GATK -T HaplotypeCaller --reference_sequence GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --input_file recalibrated.bam --dbsnp current_dbsnp/All.vcf.gz --genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --emitRefConfidence BP_RESOLUTION --out variants.vcf
However, the ID column only contains ".". How can I get HaplotypeCaller to populate the ID column with rsIDs? Also, is there a better way to get variant and non-variant genotype calls with HaplotypeCaller?