Hi All,
I used GATK CombineGVCFs
to combine GVCFs of around 50 samples.
GATK version: 4.1.4.1 was used for CombineGVCFs
Individual GVCFs are results from a pipeline where GATK version 4.1.7.0 and HaplotypeCaller
with this option was used --emit-ref-confidence GVCF
Below was the command used to combine GVCFs:
gatk CombineGVCFs --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant gvcf.list -O combined.g.vcf
The combined GVCF has the genotype information as './.', even for the positions where individual GVCF has variant.
Issues:
1) The variant information already present in individual GVCFs are missing in the combined GVCF
2) Shouldn't the positions where a variant could not be called have the genotype as '0/0' in the combined GVCF, instead of './.''
1) The variant information already present in individual GVCFs are missing in the combined GVCF
what does that mean ?
2) Shouldn't the positions where a variant could not be called have the genotype as './. in the combined GVCF
show us an example of what you think is wrong
For example:
Let's say my first sample GVCF has the following information in the file sample1.g.vcf
First few lines have no variant and has the genotype as '0/0'. The last line is a variant with genotype '0/1'
Now in the combined.g.vcf which is made by combining 10 GVCFs, see the genotypes for the above chromosomal positions. The sample1 genotype is the first column sample in the combined.g.vcf
Question/Issues:
1) chr1:13613 has a variant with genotype '0/1' in the sample1.g.vcf. But in the combined.g.vcf, it has genotype as './.'
2) chr1:10001, chr1:10002, chr1:10003 has genotypes as '0/0' in the sample1.g.vcf. But in the combined.gvcf has the genotyope as './.'
Basically, all the genotypes in the combined.g.vcf is seen as './.', whether the position has a variant in the individual GVCF or not. And this is happening for all the samples.
I hope I am clear now. Please let me know if I should give more details.
Thanks
but you don't really know the true genotype until you have genotyped the VCF.
is a variant with a poor quality, a poor depth. I wouldn't trust it even if it was called '0/1'.
Ok got it. My understanding was wrong.
I did run GenotypeGVCF and that has the genotype information correct for the position chr1:13613
I was wondering why genotype information was different in individual GVCF and the combined GVCF.