Combine two VCF files with GATK4
3
0
Entering edit mode
12 months ago
MAPK ★ 1.9k

Hi All,

I have been trying to combine two VCF files with file1.vcf with samples A,B,C and file2.vcf with samples D,E,F. I tried tools like GatherVcfs, MergeVcfs, but they both fail. What is the right tool I should be using to do this? There was CombineVariants tool in GATK3, which is not available in GATK4.

TIA

GATK • 1.5k views
ADD COMMENT
3
Entering edit mode
12 months ago
tothepoint ▴ 620

You may need to use picard tool to merge files.

java -jar path_to_picard.jar MergeVcfs I=file1.csf I=file2.vcf O=merged.vcf.gz

or try bcftools merge option

bcftools merge --merge all file1.vcf.gz file2.vcf.gz -O v -o merged.vcf.gz
ADD COMMENT
0
Entering edit mode

@devarora they (MergeVcfs) both are same, either you use via GATK or Picard. I get the same error.

ADD REPLY
0
Entering edit mode

did you tried bcftools merge option also?

ADD REPLY
0
Entering edit mode

bcftools works. Thank you.

ADD REPLY
4
Entering edit mode
12 months ago
dare_devil ★ 1.5k

MergeVcfs from gatk should work (gatk v4.1.1)

gatk --java-options '-Xmx60g' \
MergeVcfs -I file1.vcf -I file2.vcf -I file3.vcf \
-O combined.vcf

What do you mean by 'they fail' ? What is the error you are getting there?

ADD COMMENT
0
Entering edit mode

@dare_devil

The samples in file1.vcf.gz and file2.vcf.gz are different (like I said, this is what I want to combine), so this is the error I am getting:

[Sun Jan 17 22:03:33 CST 2021] MergeVcfs  --INPUT File1.vcf.gz --INPUT File2.vcf.gz --OUTPUT OUTfile.vcf.gz  --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jan 17, 2021 10:03:35 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Sun Jan 17 22:03:35 CST 2021] Executing as XXX@XXX.edu on Linux 3.10.0-1127.13.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.2.0
[Sun Jan 17 22:03:35 CST 2021] picard.vcf.MergeVcfs done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=4362600448
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.IllegalArgumentException: Input file /Seq_Data/File1.vcf.gz has sample entries that don't match the other files.
        at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:203)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
ADD REPLY
2
Entering edit mode

you can try a python package bioinfokit to deal with this

from bioinfokit.analys import marker
# concatenate VCF files. You can provide multiple VCF files separated by comma.
 marker.concatvcf("file_1.vcf,file_2.vcf,file_3.vcf,file_4.vcf")
# merged VCF files will be stored in same directory (concat_vcf.vcf)
ADD REPLY
0
Entering edit mode

MergeVcfs of gatk will not work with different entries

ADD REPLY
3
Entering edit mode
12 months ago
svp ▴ 460

Try using snpsift

java -jar SnpSift.jar split -j *.vcf > combined.vcf

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6