Entering edit mode
4.7 years ago
I am trying to extract info from a vcf file using the following command and encountered a problem:
gatk VariantsToTable -R $REF -V final_SNP.vcf -F CHROM -F POS -F REF -F ALT -F QUAL -GF AD -GF GQ -GF PL -GF GT -O snpPE_final.tsv
For SNPs with its AD value less than 100, the results are fine, but for SNPs with its AD value greater than 100, VariantsToTable just concatenates the two AD values. Here is an entry in the vcf file:
1 15880 . G A 3785.46 PASS AC=2;AF=0.500;AN=4;BaseQRankSum=6.325;DP=296;ExcessHet=4.7712;FS=3.153;MLEAC=2;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=12.79;ReadPosRankSum=-1.165;SOR=0.888 GT:AD:DP:GQ:PL 0/1:58,35:93:99:895,0,2296 0/1:98,105:203:99:2900,0,3782
And here is the corresponding row in the tsv file:
CHROM POS REF ALT QUAL S1.AD S1.GQ S1.PL S1.GT S2.AD S2.GQ S2.PL S2.GT
1 15880 G A 3785.46 58,35 99 895,0,2296 G/A 98105 99 2900,0,3782 G/A
The AD values in the S2.AD column should be 98,105, not 98105. I did not figure out how to post on the GATK forum, so posted here. Any help would be highly appreciated.
Which version of GATK are you using? Is it possible that this is a bug that has been corrected in a more recent version? If so, you actually should go back and try to post to the forums so the devs can take a look at it.
Thanks for your help! I use GATK4-4.1.2.0-1 on Ubuntu 18.04. I installed GATK4 using conda from the bioconda channel.