Question: Best way to merge multiple VCF files
1
gravatar for williamsbrian5064
2.5 years ago by
williamsbrian5064260 wrote:

Hi,

I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome number stays intact.. if that makes sense? I want to use this file with GATK so I don't just a bunch of mixed up merged vcf files.

I know there are a number of tools like vcf-merge, bcftools-merge, and GATK's CombineVariants

Does any one know which software would work best?

Also, does anyone have any suggestions/know and site that will have known SNP or indels? I got this first batch of vcf files from dbSNP

ADD COMMENTlink modified 20 months ago by benformatics2.0k • written 2.5 years ago by williamsbrian5064260
1

1) bcftools-merge or GATK's CombineVariant walker should be fine 2) Best in bioinformatics comes to user preference, ease of use and supported scientific literature. Both bcftools and GATK are well respected in bioinformatics community. 3) There are several that furnish known SNPs. Some of them are refined (for eg. COSMIC, HGMD, Clinvar etc) and some of them, not so much (for eg. dbSNP, hapmap, 1000G etc). If you can let us know what kind of SNP sources you are looking for, people here may help you out.

ADD REPLYlink written 2.5 years ago by cpad011214k

The organism I am working with is Felines, so I am willing to take whatever I can get. I am looking for SNP and indels really. I think ensembl has some data but it isn't super clear (http://useast.ensembl.org/info/data/ftp/index.html/). Ensembl has it listed at "Variation"

So really I am just looking for any suggestion for sources

ADD REPLYlink written 2.5 years ago by williamsbrian5064260
1

You can find current cat VCF files in this directory.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax91k

Do you know what that files contains those? Both SNPs and Indels?

ADD REPLYlink written 2.5 years ago by williamsbrian5064260
1

Take a look at the README file in that directory for details. There are also gvcf's available for Cat in this directory.

ADD REPLYlink written 2.5 years ago by genomax91k

Thanks! I appreciate the help!

ADD REPLYlink written 2.5 years ago by williamsbrian5064260
5
gravatar for jaybee
20 months ago by
jaybee60
South Korea
jaybee60 wrote:

You can try /PATH/to/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

[considering that your files A.vcf.gz , B.vcf.gz and so on are in the Home directory, under the folder data]

[If your bcftools is installed in the usr/bin directory then simply use:

/usr/bin/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

This should work fine!

ADD COMMENTlink written 20 months ago by jaybee60
1
gravatar for benformatics
20 months ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

If you are running GATK then you are likely using PicardTools.

PicardsTools includes a program called MergeVcfs which should do exactly what you want

-

Here are the usage examples from the GATK website:

Example 1: We combine several variant files in different formats, where at least one of them contains the contig list in its header.

java -jar picard.jar MergeVcfs \
          I=input_variants.01.vcf \
          I=input_variants.02.vcf.gz \
          O=output_variants.vcf.gz

Example 2: Similar to example 1 but we use an input list file to specify the input files:

java -jar picard.jar MergeVcfs \
          I=input_variant_files.list \
          O=output_variants.vcf.gz

Note if you installed GATK/Picard with conda then you likely may have to change the exact commands used.

ADD COMMENTlink modified 20 months ago • written 20 months ago by benformatics2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour