Question: How Can I Count Snps In My Final Vcf Files
0
gravatar for mostafarafiepour
16 months ago by
mostafarafiepour60 wrote:

Hi all,

I have a VCF file that containing 50 samples, i want to count the number of SNPs. My organism is non-model, So it does not have the chromosome.

Now, How can i count the number of SNPs for all 50 samples with this VCF?

Best Regard

Mostafa

snp • 1.2k views
ADD COMMENTlink modified 16 months ago by finswimmer13k • written 16 months ago by mostafarafiepour60

run bcftools stats on vcf. It would summarize the VCF with most of the details you need.

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011212k
1
gravatar for finswimmer
16 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello mostafarafiepour,

you've started with one question. In the meantime there are three :)

1. How to read a vcf file

This is a very basic question. So you need starting some literature:

If you doesn't understand any of the explanations, don't worry to ask.

2. How to count the variants in a vcf (your original question)

We already worked on this.

3. Is the resulted number of (2) correct?

Well, that's quite hard to say without knowing anything about your genome. How large is it? Is there a high diversity between individuals? As we just have the total number of different variants in all of your samples, it might be better to get a per sample count. The output of bcftool stats (as suggested by cpad0112 ) might be useful or have a look at this thread, especially the answers by Pierre and me.

fin swimmer

ADD COMMENTlink modified 16 months ago • written 16 months ago by finswimmer13k
0
gravatar for finswimmer
16 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

the total number you get by counting the lines in the vcf excluding the header lines.

$ grep -v "^#" input.vcf|wc -l

fin swimmer

ADD COMMENTlink written 16 months ago by finswimmer13k

Hi swimmer,

Many thanks for your reply, I've done it and gave me a number(20546654). But I do not know how correct this number is?

Code i use:

grep -v "^#" Final_VCF_50Sample.vcf|wc -l
ADD REPLYlink modified 16 months ago by finswimmer13k • written 16 months ago by mostafarafiepour60

Sorry swimmer,

Please describe the details in the photo for me.

enter image description here

ADD REPLYlink written 16 months ago by mostafarafiepour60

Please describe the details in the photo for me.

how is it related to your original question ? Are you sure you're using the correct terms ?

Each line of the VCF is a VARIANT. A Variant can be a SNP or an INDEL or etc...

The intersection of the Variant and the Samples' names is a GENOTYPE.

ADD REPLYlink written 16 months ago by Pierre Lindenbaum124k

Yes, maybe I did not ask the exact question.

I want to know what the meaning of any of the terms in the picture is?

for example:

CHROM

POS

ID

REF

ALT

QUAL

INFO

GT:AD:DP:GQ:PL

AND ......

ADD REPLYlink modified 16 months ago • written 16 months ago by mostafarafiepour60
1

I want to know what the meaning of any of the terms in the picture is?

https://samtools.github.io/hts-specs/VCFv4.3.pdf

ADD REPLYlink written 16 months ago by Pierre Lindenbaum124k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 973 users visited in the last hour