What is GenotypeGVCFs?
5 months ago
wormball ▴ 10

Hello!

This article https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels- says i should use HaplotypeCaller in GVCF mode and GenotypeGVCFs then, and this article https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels- advises to use HaplotypeCaller without GenotypeGVCFs. I tried the former (with one sample), and the result is similar to the result of HaplotypeCaller in non-GVCF mode, however it differs in some entries.

What is the difference between these two ways and in which cases should i use one or another? And what GenotypeGVCFs does at all? The manual page says "joint genotyping" but i have no idea what it means.

the second article is about RNASeq..

in which cases should i use one or another?

"However, that scaled very poorly with the number of samples, posing unacceptable limits on the size of the study cohorts that could be analyzed in that way. In addition, it was not possible to add samples incrementally to a study; all variant calling work had to be redone when new samples were introduced."

in a nutshell, use GVCF when HaplotypeCaller becomes too slow (=too many samples). I use GVCF when WGS+N_SAMPLES>20.

Thanks! But why are the results are so different between these cases? The variants are mostly the same, but some variants occur in only one case, and also the statistical numbers differ in some variants.

With GenotypeGVCFs:

JH584292.1  13164   .   C   T   123.60  .   AC=1;AF=0.500;AN=2;BaseQRankSum=3.52;DP=471;ExcessHet=3.0103;FS=3.008;MLEAC=1;MLEAF=0.500;MQ=55.68;MQRankSum=-9.396e+00;QD=0.98;SOR=1.329   GT:AD:DP:GQ:PGT:PID:PL:PS   0|1:115,11:126:99:0|1:13077_T_G:131,0,5500:13077
Without GenotypeGVCFs:

