Hi,
I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome number stays intact.. if that makes sense? I want to use this file with GATK so I don't just a bunch of mixed up merged vcf files.
I know there are a number of tools like vcf-merge, bcftools-merge, and GATK's CombineVariants
Does any one know which software would work best?
Also, does anyone have any suggestions/know and site that will have known SNP or indels? I got this first batch of vcf files from dbSNP
1) bcftools-merge or GATK's CombineVariant walker should be fine 2) Best in bioinformatics comes to user preference, ease of use and supported scientific literature. Both bcftools and GATK are well respected in bioinformatics community. 3) There are several that furnish known SNPs. Some of them are refined (for eg. COSMIC, HGMD, Clinvar etc) and some of them, not so much (for eg. dbSNP, hapmap, 1000G etc). If you can let us know what kind of SNP sources you are looking for, people here may help you out.
The organism I am working with is Felines, so I am willing to take whatever I can get. I am looking for SNP and indels really. I think ensembl has some data but it isn't super clear (http://useast.ensembl.org/info/data/ftp/index.html/). Ensembl has it listed at "Variation"
So really I am just looking for any suggestion for sources
You can find current cat VCF files in this directory.
Do you know what that files contains those? Both SNPs and Indels?
Take a look at the README file in that directory for details. There are also gvcf's available for Cat in this directory.
Thanks! I appreciate the help!