GATK tool to merge INDEL with SNPs with the same set of samples
2.7 years ago
MAPK ★ 2.0k

Hi All,

I am trying to merge two VCF files, one with SNPs and the other with INDELS. I was looking at these three methods, but I am not quite clear on which one would be the right option for me. It is likely that these two VCFs have overlapping sites, so I am not sure if Picard would be the right tool. Could someone please help me figure out the right tool.

Option 1.

java -jar picard.jar MergeVcfs I=SNPs.vcf.gz I=INDELS.vcf.gz O=WXS_INDELS_SNPs.vcf.gz

Option 2.

${JAVA} ${JAVAOPTS} -jar ${GATK} GatherVcfs -I SNPs.vcf.gz -I INDELS.vcf.gz -O WXS_INDELS_SNPs.vcf.gz

Option 3

bcftools merge --merge all  SNPs.vcf.gz INDELS.vcf.gz --force-samples -O z -o  WXS_INDELS_SNPs.vcf.gz

PS. I just checked these three methods. I found the results from Option 1 and Option 3 are the same, and GatherVcfs is not suitable for this kind of merge.

BCFtools works just fine-

bcftools concat --allow-overlaps SNVs.vcf.gz INDELS.vcf.gz

Merging and concatenation are two different operations. I am assuming since you have same sets of samples in the VCFs and you are trying to join them together, you mean concatenation rather than merging (correct my if I am wrong). Concatenation is vertical joining whereas merging is horizontal joining. GatherVCFs is concatenating whereas other options are merging.


