I am not a newbie but I am having a brain block. I want to exclude dbSNP snps from a vcf file. Don't ask me why. Anyhow if A.vcf.gz is my vcf file and All_20160408.vcf.gz is the dbSNP vcf file, what do I do? [stupid suggestion to show I am trying....: bcftools filter -e (All_20160408.vcf.gz) A.vcf.gz > New.vcffile.vcf.gz ?]... I am pretty sure this won't work so I haven't even tried it. Please give an old man some help.
BCFtools compares both position and allele, whereas VCFtools compares only position information (IIRC). You want the complement of the intersection:
bcftools isec -C A.vcf dbSNP.vcf > filtered.vcf
I would try to tackle this with grep...
First you need to isolate all rs IDs from the All_20160408.vcf.gz file to e.g. a file dbSNP_IDS.txt, then something like
zcat A.vcf.gz | grep -v -w -f dbSNP_IDS.txt > dbsnpsremoved.vcf
But there should be a more specific way to do this, probably.
I did this using Wouter DeCoster's suggestion, something like:
vcftools --vcf my.vcf --recode --keep-INFO-all --exclude-positions All_20160408.bed
It were not pretty making the bed file,
and I worry that, because I am excluding by position rather than genotype [eg. A/T at rsXXXXX], I may exclude positions that have actual unique variants, but are coincidentally located at the positions in dbSNP. ie- there must be a better way!