SNP numbers don't add up to the number of record in a vcf
0
0
Entering edit mode
20 months ago
Timotheus ▴ 40

Hello,

I called variants using bcftools to obtain a two-sample vcf from which I excluded everything except biallelic SNPs using bcftools view -m2 -M2 -v snps. Then, using gatk VariantFiltration and SelectVariants I obtained subset VCFs with different SNP classes. Surprisingly, the number of SNPs in different classes does not add up to the number of SNPs in my master biallelic VCF, and is lower by a few dozens. Assuming I considered all the possible SNP classes (like het in both samples, homo ref in one het variant in the other etc.), why could those numbers differ?

bcftools gatk • 501 views
ADD COMMENT
0
Entering edit mode

extract the the SNP from file 1:

bcftools query -f '%CROM:%POS:%REF:%ALT\n' f1.vcf.gz | sort | uniq > a

extract the the SNP from the other files:

bcftools query -f '%CROM:%POS:%REF:%ALT\n' f2.vcf.gz > b
bcftools query -f '%CROM:%POS:%REF:%ALT\n' f3.vcf.gz >> b
bcftools query -f '%CROM:%POS:%REF:%ALT\n' f4.vcf.gz >> b
sort b > c

show us the difference(s):

comm -3 a c
ADD REPLY
0
Entering edit mode

Thanks! Hmm this did not work for me for some reason, will try to invesigate why

ADD REPLY

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6