I am creating a gene sequence for a sample in the vcf using a standard reference genome. The command for generating the sequence I found on this site works well.
samtools faidx ref.fasta chrom:start-stop | bcftools consensus -s sample my.vcf
But I have separate SNP and INDEL vcf files generated using GATK UnifiedGenotyper. I would like to merge these files so I can generate a consensus sequence from a reference. I want to include the INDELS but I am having trouble finding information on what happens with common tools used to join these vcf files.
Tools like: GATK CombineVariants
Any ideas would be appreciated.
Thanks
and what do you want to know ?
The PI I work for would like to be able to generate strain specific gene sequences which include both snps and indels for the generation of PCR primers etc...
What happens when I combine the snp and indel vcf and there are overlapping sites?
Please use
ADD REPLYto answer to earlier comments, as such this thread remains logically structured and easy to follow. I now moved your comment, but as you can see that's not optimal.1) try and see. 2) If the SNP and the INDEL share the same REF allele, i would say GATK produces only one variant. Else two variants will be created.
Thank you for the help.
furthermore, GATK combine variant has a parameter to prioritize the source of genotypes.