So I am dealing with a merged vcf file containing samples sequenced by both a panel as well as exome sequencing. I have tried both vcftools and bedtools for obtaining a subset of SNPs based on the regions described in a bed file. Both worked as far as I can tell but vcftools only included half the number of SNPs that bedtools did. The number of SNPs in the bedtools subset was much closer to what I was expecting. Here are the methods I ran for both bedtools and vcftools:
vcftools --gzvcf merged.vcf.gz --bed panel.bed --out subset --recode --keep-INFO-all bedtools intersect -a merged.vcf.gz -b panel.bed | bgzip > subset.vcf.gz
What I am hoping someone can tell me is the difference between the intersecting strategies performed by bedtools and vcftools and if there is an alternative for getting vcf subsets from a bed file that would be better than these options. Thank you!