Merging multiple vcf files into one
1
1
Entering edit mode
7.8 years ago
hpapoli ▴ 140

Hello,

I have 280 vcf files, each containing about 200 SNPs from a genotyping experiment. I need to merge all these so I can have a final combined vcf where I have all SNPs in all individuals, that is if an individual lacks that SNP, in the combined file it is coded as ./. or .

I am using the following command from vcftools: vcf-merge A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

It worked for two files, although it took about 1 hour, now it's been running of 1 day for the whole 280 files. I was wondering if this is the only way of merging a large number of vcf files or if there is any other way to make it more efficient?

Thank you

vcf vcftools • 7.3k views
ADD COMMENT
3
Entering edit mode
7.8 years ago

GATK CombineVariants https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

find . -name "*.vcf.gz" > input.list

java -jar GenomeAnalysisTK.jar \
       -T CombineVariants \
       -R ref.fa
       --variant input.list
       -o out.vcf
       -genotypeMergeOptions REQUIRE_UNIQUE
ADD COMMENT
0
Entering edit mode

Thanks for the answer Pierre. Quick question on the matter of reference genome though, let's say we have a collection of VCF files from different times and thus different reference genomes, is there an easy solution with the combinevariants command ro should we lift all the non-compatible ones to a single reference genome and then combine them?

ADD REPLY

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6