Combining vcf files so that same loci data is combined
5.3 years ago
spiral01 ▴ 110

I have vcf files that I wish to combine so that any variants with matching positions are combined also. For example, if I have a variant in one file at position 123 and also one in another file there, I want that information to be combined in the genotype info.

The actual variants will be the same (T->G in one file will always be T->G in the other) as they have been created using the same reference data.

Is this possible to do in one go with any tool?

5.3 years ago

Is each file a different sample? If so, it sounds like GATK's CombineVariants tool would fit your purpose.

Thanks for your reply. Yes each file is a single individual. I am trying to combine the vindija and altai neanderthal vcf data. Both have been created using the hg19 as reference and I just want to combine the two vcf files in one.

I think that should do the trick for you then. Let me know if you have any issues.

GATK asks for a reference genome in fasta format. In this case I need the hg19 reference (obtained here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). Is the chromFa.tar.gz the correct reference file for the hg19 build?

You'll want to download the hg19.2bit file and then use their utility to convert it to a Fasta file.

5.3 years ago
spiral01 ▴ 110

Whilst Jared's answer above worked perfectly, I also had success using bcftools merge with the --force-samples argument.

