Question: Merge SNP & INDEL vcf files
0
gravatar for mplace
2.5 years ago by
mplace40
United States
mplace40 wrote:

I am creating a gene sequence for a sample in the vcf using a standard reference genome. The command for generating the sequence I found on this site works well.

samtools faidx ref.fasta chrom:start-stop | bcftools consensus -s sample my.vcf

But I have separate SNP and INDEL vcf files generated using GATK UnifiedGenotyper. I would like to merge these files so I can generate a consensus sequence from a reference. I want to include the INDELS but I am having trouble finding information on what happens with common tools used to join these vcf files.

Tools like: GATK CombineVariants

Any ideas would be appreciated.

Thanks

snp indel vcf • 2.1k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by mplace40

but I am having trouble finding information on what happens with common tools used to join these vcf files. Tools like: GATK CombineVariantsGATK CombineVariants

and what do you want to know ?

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum122k

The PI I work for would like to be able to generate strain specific gene sequences which include both snps and indels for the generation of PCR primers etc...

What happens when I combine the snp and indel vcf and there are overlapping sites?

ADD REPLYlink written 2.5 years ago by mplace40

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow. I now moved your comment, but as you can see that's not optimal.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by WouterDeCoster40k

What happens when I combine the snp and indel vcf and there are overlapping sites?

1) try and see. 2) If the SNP and the INDEL share the same REF allele, i would say GATK produces only one variant. Else two variants will be created.

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum122k

Thank you for the help.

ADD REPLYlink written 2.5 years ago by mplace40

furthermore, GATK combine variant has a parameter to prioritize the source of genotypes.

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2070 users visited in the last hour