Question: How Can I Count Snps In My Final Vcf Files
0
gravatar for mostafarafiepour
7 months ago by
mostafarafiepour60 wrote:

Hi all,

I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.

Now, How can i count the number of SNPs for all 50 samples with this VCF?

Best Regard

Mostafa

snp • 474 views
ADD COMMENTlink modified 7 months ago by finswimmer11k • written 7 months ago by mostafarafiepour60

run bcftools stats on vcf. It would summarize the VCF with most of the details you need.

ADD REPLYlink modified 7 months ago • written 7 months ago by cpad011211k
1
gravatar for finswimmer
7 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello mostafarafiepour,

you've started with one question. In the meantime there are three :)

1. How to read a vcf file

This is a very basic question. So you need starting some literature:

If you doesn't understand any of the explanations, don't worry to ask.

2. How to count the variants in a vcf (your original question)

We already worked on this.

3. Is the resulted number of (2) correct?

Well, that's quite hard to say without knowing anything about your genome. How large is it? Is there a high diversity between individuals? As we just have the total number of different variants in all of your samples, it might be better to get a per sample count. The output of bcftool stats (as suggested by cpad0112 ) might be useful or have a look at this thread, especially the answers by Pierre and me.

fin swimmer

ADD COMMENTlink modified 7 months ago • written 7 months ago by finswimmer11k
0
gravatar for finswimmer
7 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

the total number you get by counting the lines in the vcf excluding the header lines.

$ grep -v "^#" input.vcf|wc -l

fin swimmer

ADD COMMENTlink written 7 months ago by finswimmer11k

Hi swimmer,

Many thanks for your reply, I've done it and gave me a number(20546654). But I do not know how correct this number is?

Code i use:

grep -v "^#" Final_VCF_50Sample.vcf|wc -l
ADD REPLYlink modified 7 months ago by finswimmer11k • written 7 months ago by mostafarafiepour60

Sorry swimmer,

Please describe the details in the photo for me.

enter image description here

ADD REPLYlink written 7 months ago by mostafarafiepour60

Please describe the details in the photo for me.

how is it related to your original question ? Are you sure you're using the correct terms ?

Each line of the VCF is a VARIANT. A Variant can be a SNP or an INDEL or etc...

The intersection of the Variant and the Samples' names is a GENOTYPE.

ADD REPLYlink written 7 months ago by Pierre Lindenbaum118k

Yes, maybe I did not ask the exact question.

I want to know what the meaning of any of the terms in the picture is?

for example:

CHROM

POS

ID

REF

ALT

QUAL

INFO

GT:AD:DP:GQ:PL

AND ......

ADD REPLYlink modified 7 months ago • written 7 months ago by mostafarafiepour60
1

I want to know what the meaning of any of the terms in the picture is?

https://samtools.github.io/hts-specs/VCFv4.3.pdf

ADD REPLYlink written 7 months ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1187 users visited in the last hour