Question: Merging multiple vcf files into one
gravatar for hpapoli
4.7 years ago by
hpapoli90 wrote:


I have 280 vcf files, each containing about 200 SNPs from a genotyping experiment. I need to merge all these so I can have a final combined vcf where I have all SNPs in all individuals, that is if an individual lacks that SNP, in the combined file it is coded as ./. or .

I am using the following command from vcftools: vcf-merge A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

It worked for two files, although it took about 1 hour, now it's been running of 1 day for the whole 280 files. I was wondering if this is the only way of merging a large number of vcf files or if there is any other way to make it more efficient?

Thank you

vcftools vcf • 5.7k views
ADD COMMENTlink modified 4.7 years ago by Pierre Lindenbaum134k • written 4.7 years ago by hpapoli90
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

GATK CombineVariants

find . -name "*.vcf.gz" > input.list

java -jar GenomeAnalysisTK.jar \
       -T CombineVariants \
       -R ref.fa
       --variant input.list
       -o out.vcf
       -genotypeMergeOptions REQUIRE_UNIQUE
ADD COMMENTlink written 4.7 years ago by Pierre Lindenbaum134k

Thanks for the answer Pierre. Quick question on the matter of reference genome though, let's say we have a collection of VCF files from different times and thus different reference genomes, is there an easy solution with the combinevariants command ro should we lift all the non-compatible ones to a single reference genome and then combine them?

ADD REPLYlink written 3.4 years ago by Nikleotide110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1869 users visited in the last hour