Question: Combining snp and indel vcf files with GATK4
1
gravatar for seta
2.2 years ago by
seta1.4k
Sweden
seta1.4k wrote:

Hi all,

As the first experience and test, I performed variant calling on the single bam file (from human genome sequencing) using Haplotypecaller within GATK (version 4). For combining snp and indel files after hard filtering, I found that CombineVarints from GATK (version 3) worked well, however, it is not available in the version 4. So, GATK suggests using MergeVcf instead of CombineVarints. But, when I checked the total count of variant after using MergVcf, it was not correct. (In fact, the count was not the sum of the counts in the snp and in the indel files). I tried SortVcf with the below simple command:

gatk SortVcf –I snp.vcf –I indel.vcf –o combined.vcf

Using the above command, the total count of variant in the combined.vcf file was the sum of the counts in the snp and indel files. So, the command for combining the snp and indel files sounds right. However, I’m not sure about it. Could you please let me know if it is a correct approach? Please kindly let me know if I should consider anything for the analysis.

Thanks a lot

ADD COMMENTlink modified 2.2 years ago by harold.smith.tarheel4.6k • written 2.2 years ago by seta1.4k
0
gravatar for harold.smith.tarheel
2.2 years ago by
United States
harold.smith.tarheel4.6k wrote:

My guess (easily verified) is that you have some overlapping SNPs and indels that are being merged. I don't know which takes precedence in MergeVcf - v3 CombineVariants allowed the option to specify which one. SortVcf would not resolve overlaps but merely but put them in chromosome/position order, thereby preserving the same number of variants as the two individual VCFs.

ADD COMMENTlink written 2.2 years ago by harold.smith.tarheel4.6k

Thank you for your reply. Knowing which snp and indel overlapped and also specifying which one is retained is important, isn't it? If yes, so do you recommend to use MergeVcf for combining snp and indel files and doing the rest of analysis? or using SortVcf is OK ?

ADD REPLYlink written 2.2 years ago by seta1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1044 users visited in the last hour