Question: Best way to merge multiple VCF files
0
gravatar for williamsbrian5064
19 months ago by
williamsbrian5064210 wrote:

Hi,

I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome number stays intact.. if that makes sense? I want to use this file with GATK so I don't just a bunch of mixed up merged vcf files.

I know there are a number of tools like vcf-merge, bcftools-merge, and GATK's CombineVariants

Does any one know which software would work best?

Also, does anyone have any suggestions/know and site that will have known SNP or indels? I got this first batch of vcf files from dbSNP

ADD COMMENTlink modified 9 months ago by benformatics1.2k • written 19 months ago by williamsbrian5064210
1

1) bcftools-merge or GATK's CombineVariant walker should be fine 2) Best in bioinformatics comes to user preference, ease of use and supported scientific literature. Both bcftools and GATK are well respected in bioinformatics community. 3) There are several that furnish known SNPs. Some of them are refined (for eg. COSMIC, HGMD, Clinvar etc) and some of them, not so much (for eg. dbSNP, hapmap, 1000G etc). If you can let us know what kind of SNP sources you are looking for, people here may help you out.

ADD REPLYlink written 19 months ago by cpad011212k

The organism I am working with is Felines, so I am willing to take whatever I can get. I am looking for SNP and indels really. I think ensembl has some data but it isn't super clear (http://useast.ensembl.org/info/data/ftp/index.html/). Ensembl has it listed at "Variation"

So really I am just looking for any suggestion for sources

ADD REPLYlink written 19 months ago by williamsbrian5064210
1

You can find current cat VCF files in this directory.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax75k

Do you know what that files contains those? Both SNPs and Indels?

ADD REPLYlink written 19 months ago by williamsbrian5064210
1

Take a look at the README file in that directory for details. There are also gvcf's available for Cat in this directory.

ADD REPLYlink written 19 months ago by genomax75k

Thanks! I appreciate the help!

ADD REPLYlink written 19 months ago by williamsbrian5064210
3
gravatar for jaybee
9 months ago by
jaybee40
South Korea
jaybee40 wrote:

You can try /PATH/to/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

[considering that your files A.vcf.gz , B.vcf.gz and so on are in the Home directory, under the folder data]

[If your bcftools is installed in the usr/bin directory then simply use:

/usr/bin/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

This should work fine!

ADD COMMENTlink written 9 months ago by jaybee40
0
gravatar for benformatics
9 months ago by
benformatics1.2k
ETH Zurich
benformatics1.2k wrote:

If you are running GATK then you are likely using PicardTools.

PicardsTools includes a program called MergeVcfs which should do exactly what you want

-

Here are the usage examples from the GATK website:

Example 1: We combine several variant files in different formats, where at least one of them contains the contig list in its header.

java -jar picard.jar MergeVcfs \
          I=input_variants.01.vcf \
          I=input_variants.02.vcf.gz \
          O=output_variants.vcf.gz

Example 2: Similar to example 1 but we use an input list file to specify the input files:

java -jar picard.jar MergeVcfs \
          I=input_variant_files.list \
          O=output_variants.vcf.gz

Note if you installed GATK/Picard with conda then you likely may have to change the exact commands used.

ADD COMMENTlink modified 9 months ago • written 9 months ago by benformatics1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 924 users visited in the last hour