Best way to merge multiple VCF files
2
5
Entering edit mode
3.1 years ago

Hi,

I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome number stays intact.. if that makes sense? I want to use this file with GATK so I don't just a bunch of mixed up merged vcf files.

I know there are a number of tools like vcf-merge, bcftools-merge, and GATK's CombineVariants

Does any one know which software would work best?

Also, does anyone have any suggestions/know and site that will have known SNP or indels? I got this first batch of vcf files from dbSNP

Assembly genome next-gen alignment • 27k views
ADD COMMENT
1
Entering edit mode

1) bcftools-merge or GATK's CombineVariant walker should be fine 2) Best in bioinformatics comes to user preference, ease of use and supported scientific literature. Both bcftools and GATK are well respected in bioinformatics community. 3) There are several that furnish known SNPs. Some of them are refined (for eg. COSMIC, HGMD, Clinvar etc) and some of them, not so much (for eg. dbSNP, hapmap, 1000G etc). If you can let us know what kind of SNP sources you are looking for, people here may help you out.

ADD REPLY
0
Entering edit mode

The organism I am working with is Felines, so I am willing to take whatever I can get. I am looking for SNP and indels really. I think ensembl has some data but it isn't super clear (http://useast.ensembl.org/info/data/ftp/index.html/). Ensembl has it listed at "Variation"

So really I am just looking for any suggestion for sources

ADD REPLY
1
Entering edit mode

You can find current cat VCF files in this directory.

ADD REPLY
0
Entering edit mode

Do you know what that files contains those? Both SNPs and Indels?

ADD REPLY
1
Entering edit mode

Take a look at the README file in that directory for details. There are also gvcf's available for Cat in this directory.

ADD REPLY
0
Entering edit mode

Thanks! I appreciate the help!

ADD REPLY
9
Entering edit mode
2.3 years ago
jaybee ▴ 100

You can try /PATH/to/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

[considering that your files A.vcf.gz , B.vcf.gz and so on are in the Home directory, under the folder data]

[If your bcftools is installed in the usr/bin directory then simply use:

/usr/bin/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

This should work fine!

ADD COMMENT
2
Entering edit mode
2.3 years ago
benformatics ★ 2.1k

If you are running GATK then you are likely using PicardTools.

PicardsTools includes a program called MergeVcfs which should do exactly what you want

-

Here are the usage examples from the GATK website:

Example 1: We combine several variant files in different formats, where at least one of them contains the contig list in its header.

java -jar picard.jar MergeVcfs \
          I=input_variants.01.vcf \
          I=input_variants.02.vcf.gz \
          O=output_variants.vcf.gz

Example 2: Similar to example 1 but we use an input list file to specify the input files:

java -jar picard.jar MergeVcfs \
          I=input_variant_files.list \
          O=output_variants.vcf.gz

Note if you installed GATK/Picard with conda then you likely may have to change the exact commands used.

ADD COMMENT

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6