How Can I Count Snps In My Final Vcf Files
5
3
Entering edit mode
5.7 years ago

Hi all,

I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.

Now, How can i count the number of SNPs for all 50 samples with this VCF?

Best Regard

Mostafa

SNP • 17k views
ADD COMMENT
2
Entering edit mode

run bcftools stats on vcf. It would summarize the VCF with most of the details you need.

ADD REPLY
0
Entering edit mode

try this

bcftools query -f '%POS\n' file.vcf.gz | wc -l

ADD REPLY
2
Entering edit mode
5.7 years ago

Hello,

the total number you get by counting the lines in the vcf excluding the header lines.

$ grep -v "^#" input.vcf|wc -l

fin swimmer

ADD COMMENT
0
Entering edit mode

Hi swimmer,

Many thanks for your reply, I've done it and gave me a number(20546654). But I do not know how correct this number is?

Code i use:

grep -v "^#" Final_VCF_50Sample.vcf|wc -l
ADD REPLY
0
Entering edit mode

Sorry swimmer,

Please describe the details in the photo for me.

enter image description here

ADD REPLY
0
Entering edit mode

Please describe the details in the photo for me.

how is it related to your original question ? Are you sure you're using the correct terms ?

Each line of the VCF is a VARIANT. A Variant can be a SNP or an INDEL or etc...

The intersection of the Variant and the Samples' names is a GENOTYPE.

ADD REPLY
0
Entering edit mode

Yes, maybe I did not ask the exact question.

I want to know what the meaning of any of the terms in the picture is?

for example:

CHROM

POS

ID

REF

ALT

QUAL

INFO

GT:AD:DP:GQ:PL

AND ......

ADD REPLY
2
Entering edit mode

I want to know what the meaning of any of the terms in the picture is?

https://samtools.github.io/hts-specs/VCFv4.3.pdf

ADD REPLY
1
Entering edit mode
5.7 years ago

Hello mostafarafiepour,

you've started with one question. In the meantime there are three :)

1. How to read a vcf file

This is a very basic question. So you need starting some literature:

If you doesn't understand any of the explanations, don't worry to ask.

2. How to count the variants in a vcf (your original question)

We already worked on this.

3. Is the resulted number of (2) correct?

Well, that's quite hard to say without knowing anything about your genome. How large is it? Is there a high diversity between individuals? As we just have the total number of different variants in all of your samples, it might be better to get a per sample count. The output of bcftool stats (as suggested by cpad0112 ) might be useful or have a look at this thread, especially the answers by Pierre and me.

fin swimmer

ADD COMMENT
0
Entering edit mode

Hi finswimmer,

Can you please tell me how to count per sample variants from a multi sample vcf file? (how many 1/1, 0/1 SNPs are their per sample)

Thank you

ADD REPLY
0
Entering edit mode
3.2 years ago

Here is a quick way to count biallelic SNPs in vcf.gz files (use "cat" instead of "zcat" for uncompressed vcf files):

zcat input.vcf.gz | awk '{if ($4~/^[ACGT]$/ && $5~/^[ACGT]$/){c++}} END {print c}'
ADD COMMENT
0
Entering edit mode

I'm sorry but this is wrong.

> echo "chr1 123 . A C,<INDEL>,G" | awk '{if ($4~/[ATCG]/ && $5~/[ATCG]/) {c++}} END {print c}'
1

better use:

bcftools view --no-header -G -m 2 -M 2 --types snps input.vcf.gz | wc -l

ADD REPLY
0
Entering edit mode

Yes, that was indeed wrong, apologies. I fixed it now.

ADD REPLY
0
Entering edit mode
3.2 years ago
4galaxy77 2.8k

If all your variants in the vcf are SNPS, then a very quick way is to first index and then index again with the -n flag.

bcftools index data.vcf 
bcftools index -n data.vcf
ADD COMMENT
0
Entering edit mode
3.2 years ago

try this

bcftools query -f '%POS\n' file.vcf.gz | wc -l

ADD COMMENT
0
Entering edit mode

why do you output %POS ?

 bcftools query -f 'x' input.vcf | wc -c
ADD REPLY

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6