I wanted to test if the resulting VCF would be the same if I split a BAM file before variant calling against calling variants on the whole BAM as input.
I made 2 different runs: The 1st run was using the splitted BAMs as inputs to HaplotypeCaller which will output 4 VCF files for each splitted BAM Then calling GenotypeGVCFs on all of them using -V option (i.e. -V file1.vcf -V file2.vcf -V file3.vcf -V file4.vcf) the output of that step is one VCF. The 2nd run was using the whole original BAM as input to HaplotypeCaller which will output one VCF <whole_vcf> then calling GenotypeGVCFs.
HaplotypeCaller, for each splitted file (chunk1.bam, chunk2.bam, chunk3.bam, chunk4.bam):
java -jar gatk3.jar -T HaplotypeCaller -R human_g1k_v37.fasta -D dbsnp_137.b37.excluding_sites_after_129.vcf
-o chunk1.vcf -pairHMM VECTOR_LOGLESS_CACHING --emitRefConfidence GVCF --variant_index_type LINEAR
--variant_index_parameter 128000 -A DepthPerAlleleBySample -stand_call_conf 30 -stand_emit_conf 10 -I chunk1.bam
java -jar gatk3.jar -T GenotypeGVCFs -R human_g1k_v37.fasta -D dbsnp_137.b37.excluding_sites_after_129.vcf
-o out_from_splits.vcf -A Coverage -A GCContent -A HaplotypeScore -A MappingQualityRankSumTest -A InbreedingCoeff
-A FisherStrand -A QualByDepth -A ChromosomeCounts -V chunk1.vcf -V chunk2.vcf -V chunk3.vcf -V chunk4.vcf
Run2 commands are the same but they are called on a the whole original BAM instead of multiple chunks.
The resulting VCF from GenotypeGVCFs in the 2nd run <whole_vcf> was 33M whereas resulting VCF from the chunks of the 1st run was 7M.
I tried to check what went wrong so I merged the 4 VCFs from HaplotypeCaller using vcf-merge from vcftools then vcf-compare to compare between the merged VCF and the whole VCF (whole_vcf), the number of variants was identical and both files matched expected 3 mismatches detected.
Can anybody tell me what went wrong? Why are the outputs of GenotypeGVCF from the two experiments different? Should I merge the VCFs from the 1st run before Genotyping to get same results?
Thanks in advance,