Question: Variant call format (VCF) file, how to get statistics per sample?
3
gravatar for Tails
5.5 years ago by
Tails60
New Zealand
Tails60 wrote:

It is quite common to find tools that report on statistics per variant/marker, given a vcf file, but does anyone know of a tool that can report on stats per sample?

More specifically, I am looking for a tool that can tell me the proportion of variants/markers that are heterozygous (1/0 or 0/1) for each SAMPLE, given a vcf file.

myposts vcf • 15k views
ADD COMMENTlink modified 19 months ago by Cookie-san110 • written 5.5 years ago by Tails60
1

most people write their own script to parse data like that in a VCF file.

ADD REPLYlink written 5.5 years ago by QVINTVS_FABIVS_MAXIMVS2.4k

Use NextGene software trial 35 days

ADD REPLYlink written 5.5 years ago by vassialk200

My lab is using the NextGene free trial and we have managed to align the sequencing run and can see the variants on screen. Hovering over the variant, we see the specific data. However, we cannot get the report that lists all the variants detected with the variant frequency, coverage, etc. Can anybody help? How do we achieve the report?? Thanks

ADD REPLYlink written 2.8 years ago by ruth_s0
12
gravatar for Len Trigg
5.5 years ago by
Len Trigg1.5k
New Zealand
Len Trigg1.5k wrote:

RTG Tools includes a vcfstats command that outputs basic statistics for every sample (or you can request specific samples). e.g. on a small simulated vcf:

$ rtg vcfstats family.vcf.gz

Location                     : family.vcf.gz
Failed Filters               : 0
Passed Filters               : 144

Sample Name: sm_mom
SNPs                         : 91
MNPs                         : 1
Insertions                   : 5
Deletions                    : 2
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 44
SNP Transitions/Transversions: 1.74 (73/42)
Total Het/Hom ratio          : 2.96 (74/25)
SNP Het/Hom ratio            : 2.79 (67/24)
MNP Het/Hom ratio            : - (1/0)
Insertion Het/Hom ratio      : 4.00 (4/1)
Deletion Het/Hom ratio       : - (2/0)
Indel Het/Hom ratio          : - (0/0)
Insertion/Deletion ratio     : 2.50 (5/2)
Indel/SNP+MNP ratio          : 0.08 (7/92)

Sample Name: sm_dad
SNPs                         : 73
MNPs                         : 2
Insertions                   : 2
Deletions                    : 3
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 63
SNP Transitions/Transversions: 1.87 (58/31)
Total Haploid                : 19
Haploid SNPs                 : 17
Haploid MNPs                 : 0
Haploid Insertions           : 1
Haploid Deletions            : 1
Haploid Indels               : 0
Total Het/Hom ratio          : 2.59 (44/17)
SNP Het/Hom ratio            : 2.50 (40/16)
MNP Het/Hom ratio            : 1.00 (1/1)
Insertion Het/Hom ratio      : - (1/0)
Deletion Het/Hom ratio       : - (2/0)
Indel Het/Hom ratio          : - (0/0)
Insertion/Deletion ratio     : 0.67 (2/3)
Indel/SNP+MNP ratio          : 0.07 (5/75)

[...]

 

ADD COMMENTlink written 5.5 years ago by Len Trigg1.5k
3
gravatar for William
5.5 years ago by
William4.7k
Europe
William4.7k wrote:

You can use SnpSift:

cat variants.vcf | java -jar SnpSift.jar filter " ( countHet() == 2 )" | grep -v '#' | wc -l 

instead of just countHet() you can create any complex filter function using the VCF fields and the functions build into SnpSift.

See also the SnpSift webpage:

http://snpeff.sourceforge.net/SnpSift.html#filter

To get the total number of variants in your vcf file just do grep -v '#' variants.vcf | wc -l

 

 

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by William4.7k
2
gravatar for kautilya
5.5 years ago by
kautilya410
United States
kautilya410 wrote:

You could use the vcf-stats utility in vcftools suite to compile these statistics for you. 

Usage:

vcf-stats input.vcf
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by kautilya410
1

the statistics provided by vcf-stats are confusing. the sum of SNPs and indel is greater than total variants.

ADD REPLYlink written 4.8 years ago by jianggl200040

If that was the case, it appears to be correct in bcftools stats v1.9

ADD REPLYlink written 19 months ago by Kevin Blighe69k
1
gravatar for Cookie-san
19 months ago by
Cookie-san110
Cookie-san110 wrote:

Hi! You can try bcftools stats. Also vcflib's vcfstats. bcftools stats can also do comparisons and gives a much more detailed output that vcflib vcfstats.

Edit: ALSO. If you try and use both vcfstats and bcftools stats, your stats won't be quite the same. I'm trying to figure out how the code differs between the tools. When I do, I'll let you know! (If you wanted to know in the first place ;) )

ADD COMMENTlink modified 19 months ago • written 19 months ago by Cookie-san110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2584 users visited in the last hour
_