I have 5 set of vcf files aligned using BWA and variants are called using GATK. I'm interested to find for common SNPs that are found in the all 5 vcf file. i cant able to get the common number of SNPs in the vcf file but using vcf compare. but i want to extract only the common SNPs or variants in all 5 vcf file.
can any one help me to find the common variants in my vcf file.
BCFTools isec will do multiple-file intersection. The output, if I remember correctly, is a tab-delimited format and not a VCF, but it will tell you how many variants overlap and what their positions are, etc. You can specify how many files a variant has to appear in out of the list provided to be reported, so it is easy to run it more than once and get variants that appear in all 5, any 4, etc.
Another option would be bcbio-variation from Brad Chapman's group. It has various subtools that you can use. You can generate summaries of concordance between files as well as construct ensemble call sets where you specify the number of callers (vcf files) a variant had to appear in. The output of bcbio-variation ensemble calling is a VCF file so it can then be directly fed into downstream tools.