Question: Best way to merge multiple VCF files
0
gravatar for williamsbrian5064
11 months ago by
williamsbrian5064150 wrote:

Hi,

I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome number stays intact.. if that makes sense? I want to use this file with GATK so I don't just a bunch of mixed up merged vcf files.

I know there are a number of tools like vcf-merge, bcftools-merge, and GATK's CombineVariants

Does any one know which software would work best?

Also, does anyone have any suggestions/know and site that will have known SNP or indels? I got this first batch of vcf files from dbSNP

ADD COMMENTlink modified 5 weeks ago by benformatics650 • written 11 months ago by williamsbrian5064150
1

1) bcftools-merge or GATK's CombineVariant walker should be fine 2) Best in bioinformatics comes to user preference, ease of use and supported scientific literature. Both bcftools and GATK are well respected in bioinformatics community. 3) There are several that furnish known SNPs. Some of them are refined (for eg. COSMIC, HGMD, Clinvar etc) and some of them, not so much (for eg. dbSNP, hapmap, 1000G etc). If you can let us know what kind of SNP sources you are looking for, people here may help you out.

ADD REPLYlink written 11 months ago by cpad011211k

The organism I am working with is Felines, so I am willing to take whatever I can get. I am looking for SNP and indels really. I think ensembl has some data but it isn't super clear (http://useast.ensembl.org/info/data/ftp/index.html/). Ensembl has it listed at "Variation"

So really I am just looking for any suggestion for sources

ADD REPLYlink written 11 months ago by williamsbrian5064150
1

You can find current cat VCF files in this directory.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax64k

Do you know what that files contains those? Both SNPs and Indels?

ADD REPLYlink written 11 months ago by williamsbrian5064150
1

Take a look at the README file in that directory for details. There are also gvcf's available for Cat in this directory.

ADD REPLYlink written 11 months ago by genomax64k

Thanks! I appreciate the help!

ADD REPLYlink written 11 months ago by williamsbrian5064150
0
gravatar for jaybee
5 weeks ago by
jaybee10
South Korea
jaybee10 wrote:

You can try /PATH/to/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

[considering that your files A.vcf.gz , B.vcf.gz and so on are in the Home directory, under the folder data]

[If your bcftools is installed in the usr/bin directory then simply use:

/usr/bin/bcftools merge Home/data/*vcf.gz -Oz -o Merged.vcf.gz

This should work fine!

ADD COMMENTlink written 5 weeks ago by jaybee10
0
gravatar for benformatics
5 weeks ago by
benformatics650
ETH Zurich
benformatics650 wrote:

If you are running GATK then you are likely using PicardTools.

PicardsTools includes a program called MergeVcfs which should do exactly what you want

-

Here are the usage examples from the GATK website:

Example 1: We combine several variant files in different formats, where at least one of them contains the contig list in its header.

java -jar picard.jar MergeVcfs \
          I=input_variants.01.vcf \
          I=input_variants.02.vcf.gz \
          O=output_variants.vcf.gz

Example 2: Similar to example 1 but we use an input list file to specify the input files:

java -jar picard.jar MergeVcfs \
          I=input_variant_files.list \
          O=output_variants.vcf.gz

Note if you installed GATK/Picard with conda then you likely may have to change the exact commands used.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by benformatics650
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1224 users visited in the last hour