Question

How Can I Count Snps In My Final Vcf Files

3

Entering edit mode

6.2 years ago

mostafarafiepour ▴ 180

Hi all,

I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.

Now, How can i count the number of SNPs for all 50 samples with this VCF?

Best Regard

Mostafa

SNP • 19k views

ADD COMMENT • link updated 5 months ago by Pierre Lindenbaum 164k • written 6.2 years ago by mostafarafiepour ▴ 180

2

Entering edit mode

run bcftools stats on vcf. It would summarize the VCF with most of the details you need.

ADD REPLY • link 6.2 years ago by cpad0112 21k

0

Entering edit mode

try this

bcftools query -f '%POS\n' file.vcf.gz | wc -l

ADD REPLY • link 3.7 years ago by ghada.alqubati101 • 0

finswimmer · Answer 1 · 2018-08-12

2

Entering edit mode

6.2 years ago

finswimmer 16k

Hello,

the total number you get by counting the lines in the vcf excluding the header lines.

$ grep -v "^#" input.vcf|wc -l

fin swimmer

ADD COMMENT • link 6.2 years ago by finswimmer 16k

0

Entering edit mode

Hi swimmer,

Many thanks for your reply, I've done it and gave me a number(20546654). But I do not know how correct this number is?

Code i use:

grep -v "^#" Final_VCF_50Sample.vcf|wc -l

ADD REPLY • link updated 6.2 years ago by finswimmer 16k • written 6.2 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Sorry swimmer,

Please describe the details in the photo for me.

enter image description here

ADD REPLY • link 6.2 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Please describe the details in the photo for me.

how is it related to your original question ? Are you sure you're using the correct terms ?

Each line of the VCF is a VARIANT. A Variant can be a SNP or an INDEL or etc...

The intersection of the Variant and the Samples' names is a GENOTYPE.

ADD REPLY • link 6.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Yes, maybe I did not ask the exact question.

I want to know what the meaning of any of the terms in the picture is?

for example:

CHROM

POS

ID

REF

ALT

QUAL

INFO

GT:AD:DP:GQ:PL

AND ......

ADD REPLY • link 6.2 years ago by mostafarafiepour ▴ 180

2

Entering edit mode

I want to know what the meaning of any of the terms in the picture is?

https://samtools.github.io/hts-specs/VCFv4.3.pdf

ADD REPLY • link 6.2 years ago by Pierre Lindenbaum 164k

score 1 · Answer 2 · 2018-08-12

Hello mostafarafiepour,

you've started with one question. In the meantime there are three :)

1. How to read a vcf file

This is a very basic question. So you need starting some literature:

Wikipedia
the official specs (already posted by Pierre Lindenbaum )
Have a look into the header of your file. Lot's of explanations are there

If you doesn't understand any of the explanations, don't worry to ask.

2. How to count the variants in a vcf (your original question)

We already worked on this.

3. Is the resulted number of (2) correct?

Well, that's quite hard to say without knowing anything about your genome. How large is it? Is there a high diversity between individuals? As we just have the total number of different variants in all of your samples, it might be better to get a per sample count. The output of bcftool stats (as suggested by cpad0112 ) might be useful or have a look at this thread, especially the answers by Pierre and me.

fin swimmer

score 0 · Answer 3 · 2021-01-22

0

Entering edit mode

3.7 years ago

dario.galanti92 ▴ 10

Here is a quick way to count biallelic SNPs in vcf.gz files (use "cat" instead of "zcat" for uncompressed vcf files):

zcat input.vcf.gz | awk '{if ($4~/^[ACGT]$/ && $5~/^[ACGT]$/){c++}} END {print c}'

ADD COMMENT • link 3.7 years ago by dario.galanti92 ▴ 10

0

Entering edit mode

I'm sorry but this is wrong.

> echo "chr1 123 . A C,<INDEL>,G" | awk '{if ($4~/[ATCG]/ && $5~/[ATCG]/) {c++}} END {print c}'
1

better use:

bcftools view --no-header -G -m 2 -M 2 --types snps input.vcf.gz | wc -l

ADD REPLY • link 3.7 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Yes, that was indeed wrong, apologies. I fixed it now.

ADD REPLY • link 3.7 years ago by dario.galanti92 ▴ 10

score 0 · Answer 4 · 2021-01-22

0

Entering edit mode

3.7 years ago

4galaxy77 2.8k

If all your variants in the vcf are SNPS, then a very quick way is to first index and then index again with the -n flag.

bcftools index data.vcf 
bcftools index -n data.vcf

ADD COMMENT • link 3.7 years ago by 4galaxy77 2.8k

score 0 · Answer 5 · 2021-01-25

0

Entering edit mode

3.7 years ago

ghada.alqubati101 • 0

try this

bcftools query -f '%POS\n' file.vcf.gz | wc -l

ADD COMMENT • link 3.7 years ago by ghada.alqubati101 • 0

0

Entering edit mode

why do you output %POS ?

 bcftools query -N -f 'x' input.vcf | wc -c

ADD REPLY • link 5 months ago by Pierre Lindenbaum 164k