Question: bcftools filter or bcftools isec to EXCLUDE dbSNP snps
0
gravatar for bgold04
13 months ago by
bgold040
United States
bgold040 wrote:

I am not a newbie but I am having a brain block. I want to exclude dbSNP snps from a vcf file. Don't ask me why. Anyhow if A.vcf.gz is my vcf file and All_20160408.vcf.gz is the dbSNP vcf file, what do I do? [stupid suggestion to show I am trying....: bcftools filter -e (All_20160408.vcf.gz) A.vcf.gz > New.vcffile.vcf.gz ?]... I am pretty sure this won't work so I haven't even tried it. Please give an old man some help.

ADD COMMENTlink modified 13 months ago by harold.smith.tarheel3.9k • written 13 months ago by bgold040

One way could be to use vcftools --exclude-positions command and recode the vcf ?. The positions can be obtained from the second vcf using cut or awk.

ADD REPLYlink written 13 months ago by microfuge720
1
gravatar for harold.smith.tarheel
13 months ago by
United States
harold.smith.tarheel3.9k wrote:

BCFtools compares both position and allele, whereas VCFtools compares only position information (IIRC). You want the complement of the intersection:

bcftools isec -C A.vcf dbSNP.vcf > filtered.vcf
ADD COMMENTlink written 13 months ago by harold.smith.tarheel3.9k
0
gravatar for WouterDeCoster
13 months ago by
Belgium
WouterDeCoster23k wrote:

I would try to tackle this with grep...

First you need to isolate all rs IDs from the All_20160408.vcf.gz file to e.g. a file dbSNP_IDS.txt, then something like

zcat A.vcf.gz | grep -v -w -f dbSNP_IDS.txt > dbsnpsremoved.vcf

But there should be a more specific way to do this, probably.

ADD COMMENTlink written 13 months ago by WouterDeCoster23k
0
gravatar for bgold04
13 months ago by
bgold040
United States
bgold040 wrote:

I did this using Wouter DeCoster's suggestion, something like:

vcftools --vcf my.vcf --recode --keep-INFO-all --exclude-positions All_20160408.bed

It were not pretty making the bed file,

and I worry that, because I am excluding by position rather than genotype [eg. A/T at rsXXXXX], I may exclude positions that have actual unique variants, but are coincidentally located at the positions in dbSNP. ie- there must be a better way!

ADD COMMENTlink written 13 months ago by bgold040
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 893 users visited in the last hour