Merging multiple vcfs with GATK's CombineVariants
2
0
Entering edit mode
6.0 years ago
t86dan ▴ 30

Hey guys, im trying to merge multiple vcf files with the Genome Analysis Toolkit (V. 3.5). Usually when dealing with few samples the command would be something like this:

java -jar GenomeAnalysisTK.jar -T CombineVariants -R ../reference.fa --variant sample1.vcf --variant sample2.vcf --variant sample3.vcf --variant sample4.vcf --variant sample5.vcf --variant sample6.vcf -o merge_file.vcf


The problem I have right now is I have many vcf files I want to merge (not just 6 like in the previous example). I have been digging through the command [options] but there is no option to select for example a whole directory with the vcf's in it. Or alternatively write something like ''--variant *.vcf'' so that it selects all of my vcf files and applies the CombineVariants to them.

So in conclusion my question is this: Is typing one by one the vcf files the only way of running this command with many vcfs?

Combinevariants vcf merging • 12k views
8
Entering edit mode
6.0 years ago

It's not documented for as much as I know, but you can use a list as argument to --variant e.g., consider the following:

ls *vcf > vcfs.list
java -jar GenomeAnalysisTK.jar -T CombineVariants -R \$REF --variant vcfs.list -o combined.vcf -genotypeMergeOptions UNIQUIFY

0
Entering edit mode
6.0 years ago
t86dan ▴ 30

Thank you very much! I actually figured it out and did it a little different than you just said, although pretty much the same. I did a script listing all the files (I had them listed already so i just added the --variant part to each file). I ended up doing it manually but I guess how I did it was a little less painful than typing one by one in the command line.

java -jar GenomeAnalysisTK.jar \
-T CombineVariants \
-R REFERENCE \
--variant sample1.vcf \
--variant sample2.vcf \
--variant sample3.vcf \
--variant sample4.vcf \
--variant sample5.vcf \
--variant etc.vcf \
-o combined.vcf


Appreciate your feedback. I did it again following your instructions and of course it worked. Now I know how to do the same but with a simple command (ls > list.txt)

2
Entering edit mode

Please use ADD COMMENT to reply to earlier posts, as such this thread remains logically structured and easy to follow.

It's good that you found a solution, although I would argue that yours likely takes longer and is more error prone :-) Good luck with the rest of your analysis.