bcftools filter or bcftools isec to EXCLUDE dbSNP snps
3
0
Entering edit mode
7.5 years ago
bgold04 • 0

I am not a newbie but I am having a brain block. I want to exclude dbSNP snps from a vcf file. Don't ask me why. Anyhow if A.vcf.gz is my vcf file and All_20160408.vcf.gz is the dbSNP vcf file, what do I do? [stupid suggestion to show I am trying....: bcftools filter -e (All_20160408.vcf.gz) A.vcf.gz > New.vcffile.vcf.gz ?]... I am pretty sure this won't work so I haven't even tried it. Please give an old man some help.

bcftools vcftools isec filter dbsnp • 5.3k views
ADD COMMENT
0
Entering edit mode

One way could be to use vcftools --exclude-positions command and recode the vcf ?. The positions can be obtained from the second vcf using cut or awk.

ADD REPLY
2
Entering edit mode
7.5 years ago

BCFtools compares both position and allele, whereas VCFtools compares only position information (IIRC). You want the complement of the intersection:

bcftools isec -C A.vcf dbSNP.vcf > filtered.vcf
ADD COMMENT
0
Entering edit mode
7.5 years ago

I would try to tackle this with grep...

First you need to isolate all rs IDs from the All_20160408.vcf.gz file to e.g. a file dbSNP_IDS.txt, then something like

zcat A.vcf.gz | grep -v -w -f dbSNP_IDS.txt > dbsnpsremoved.vcf

But there should be a more specific way to do this, probably.

ADD COMMENT
0
Entering edit mode

Hi WouterDeCoster, Just wondering why do we want to only filter the dbSNP variants with rs IDs? Correct me if I am wrong, I thought dbSNP accept submission of SNPs from disease and silence mutation SNPs. So will I be filtering some important disease-related SNPs? Thanks!

ADD REPLY
0
Entering edit mode

Definitely possible that you lose important variants. I can imagine that someone wants to look at variants that have never been observed or are very rare.

ADD REPLY
0
Entering edit mode

Thanks for the reply. But why do we have to isolate all rs IDs from the dbSNPs before using it to filter our sample vcf?

ADD REPLY
0
Entering edit mode
7.5 years ago
bgold04 • 0

I did this using Wouter DeCoster's suggestion, something like:

vcftools --vcf my.vcf --recode --keep-INFO-all --exclude-positions All_20160408.bed

It were not pretty making the bed file,

and I worry that, because I am excluding by position rather than genotype [eg. A/T at rsXXXXX], I may exclude positions that have actual unique variants, but are coincidentally located at the positions in dbSNP. ie- there must be a better way!

ADD COMMENT

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6