GATK CombineGVCFs with genotypes './.'
0
0
Entering edit mode
21 months ago
ttom ▴ 220

Hi All,

I used GATK CombineGVCFs to combine GVCFs of around 50 samples.

GATK version: 4.1.4.1 was used for CombineGVCFs

Individual GVCFs are results from a pipeline where GATK version 4.1.7.0 and HaplotypeCaller with this option was used --emit-ref-confidence GVCF

Below was the command used to combine GVCFs:

gatk CombineGVCFs --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant gvcf.list -O combined.g.vcf

The combined GVCF has the genotype information as './.', even for the positions where individual GVCF has variant.

Issues:

1) The variant information already present in individual GVCFs are missing in the combined GVCF

2) Shouldn't the positions where a variant could not be called have the genotype as '0/0' in the combined GVCF, instead of './.''

GATK CombineGVCFs • 921 views
ADD COMMENT
0
Entering edit mode

1) The variant information already present in individual GVCFs are missing in the combined GVCF

what does that mean ?

2) Shouldn't the positions where a variant could not be called have the genotype as './. in the combined GVCF

show us an example of what you think is wrong

ADD REPLY
0
Entering edit mode

For example:

Let's say my first sample GVCF has the following information in the file sample1.g.vcf

First few lines have no variant and has the genotype as '0/0'. The last line is a variant with genotype '0/1'

chr1    1       .       N       <NON_REF>       .       .       END=10000       GT:DP:GQ:MIN_DP:PL      0/0:0:0:0:0,0,0
chr1    10001   .       T       <NON_REF>       .       .       END=10002       GT:DP:GQ:MIN_DP:PL      0/0:4:12:40,12,104
chr1    10003   .       A       <NON_REF>       .       .       END=10005       GT:DP:GQ:MIN_DP:PL      0/0:5:15:5:0,15,131
chr1    13613   .       T       A,<NON_REF>     49.64   .      BaseQRankSum=0.489;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.930;RAW_MQandDP=4332,8;ReadPosRankSum=-0.992     GT:AD:DP:GQ:PL:SB       0/1:5,3,0:8:57:57,0,119,72,128,200:1,4,3,0

Now in the combined.g.vcf which is made by combining 10 GVCFs, see the genotypes for the above chromosomal positions. The sample1 genotype is the first column sample in the combined.g.vcf

chr1    1       .       N       <NON_REF>       .       .       END=10000       GT:DP:GQ:MIN_DP:PL      ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0 ./.:0:0:0:0,0,0
chr1    10001   .       T       <NON_REF>       .       .       .       GT:DP:GQ:MIN_DP:PL      ./.:4:12:4:0,12,104     ./.:0:0:0:0,0,0 ./.:3:3:3:0,3,45    ./.:0:0:0:0,0,0./.:0:0:0:0,0,0 ./.:7:15:7:0,15,225     ./.:0:0:0:0,0,0 ./.:4:6:2:0,6,49        ./.:6:6:5:0,6,90        ./.:0:0:0:0,0,0
chr1    10002   .       A       <NON_REF>       .       .       .       GT:DP:GQ:MIN_DP:PL      ./.:4:12:4:0,12,104     ./.:0:0:0:0,0,0 ./.:6:6:6:0,6,90       ./.:5:3:5:0,3,45        ./.:0:0:0:0,0,0 ./.:11:18:11:0,18,270   ./.:13:6:13:0,6,90      ./.:4:6:2:0,6,49        ./.:6:6:5:0,6,90        ./.:0:0:0:0,0,0
chr1    10003   .       A       <NON_REF>       .       .       .       GT:DP:GQ:MIN_DP:PL      ./.:5:15:5:0,15,131     ./.:16:6:13:0,6,90      ./.:6:6:6:0,6,90       ./.:10:0:10:0,0,211     ./.:0:0:0:0,0,0 ./.:11:21:11:0,21,315   ./.:18:0:18:0,0,385     ./.:4:6:2:0,6,49        ./.:9:0:8:0,0,145       ./.:10:9:9:0,9,135
chr1    13613   .       T       A,<NON_REF>     .       .       BaseQRankSum=0.489;DP=334;ExcessHet=3.01;MQRankSum=-1.930e+00;RAW_MQandDP=9440,17;ReadPosRankSum=0.00     GT:AD:DP:GQ:MIN_DP:PL:SB        ./.:5,3,0:8:57:.:57,0,119,72,128,200:1,4,3,0      ./.:.:78:99:57:0,106,1800,106,1800,1800 ./.:.:24:37:24:0,37,690,37,690,690      ./.:7,2,0:9:27:.:27,0,156,47,162,210:2,5,2,0      ./.:.:68:88:68:0,88,2118,88,2118,2118   ./.:.:44:84:44:0,84,1508,84,1508,1508     ./.:.:41:0:41:0,0,1074,0,1074,1074      ./.:.:30:84:29:0,84,1260,84,1260,1260   ./.:.:1:3:1:0,3,26,3,26,26./.:.:53:44:53:0,44,1607,44,1607,1607

Question/Issues:

1) chr1:13613 has a variant with genotype '0/1' in the sample1.g.vcf. But in the combined.g.vcf, it has genotype as './.'

2) chr1:10001, chr1:10002, chr1:10003 has genotypes as '0/0' in the sample1.g.vcf. But in the combined.gvcf has the genotyope as './.'

Basically, all the genotypes in the combined.g.vcf is seen as './.', whether the position has a variant in the individual GVCF or not. And this is happening for all the samples.

I hope I am clear now. Please let me know if I should give more details.

Thanks

ADD REPLY
0
Entering edit mode

but you don't really know the true genotype until you have genotyped the VCF.

 GT:AD:DP:GQ:PL:SB  0/1:5,3,0:8:57:57,0,119,72,128,200:1,4

is a variant with a poor quality, a poor depth. I wouldn't trust it even if it was called '0/1'.

ADD REPLY
1
Entering edit mode

Ok got it. My understanding was wrong.

I did run GenotypeGVCF and that has the genotype information correct for the position chr1:13613

chr1    13613   .       T       A       69.05   .       AC=2;AF=0.100;AN=20;BaseQRankSum=0.489;DP=334;ExcessHet=3.2451;FS=20.434;InbreedingCoeff=-0.1638;MLEAC=2;MLEAF=0.100;MQ=23.56;MQRankSum=-1.930e+00;QD=4.06;ReadPosRankSum=0.00;SOR=3.588    GT:AD:DP:GQ:PL  0/1:5,3:8:57:57,0,1190/0:57,0:57:99:0,106,1800        0/0:24,0:24:37:0,37,690 0/1:7,2:9:27:27,0,156   0/0:68,0:68:88:0,88,2118      0/0:44,0:44:84:0,84,1508        0/0:41,0:41:0:0,0,1074  0/0:29,0:29:84:0,84,1260      0/0:1,0:1:3:0,3,26      0/0:53,0:53:44:0,44,1607

I was wondering why genotype information was different in individual GVCF and the combined GVCF.

ADD REPLY

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6