Variant call format (VCF) file, how to get statistics per sample?
4
3
Entering edit mode
7.3 years ago
Tails ▴ 80

It is quite common to find tools that report on statistics per variant/marker, given a vcf file, but does anyone know of a tool that can report on stats per sample?

More specifically, I am looking for a tool that can tell me the proportion of variants/markers that are heterozygous (1/0 or 0/1) for each SAMPLE, given a vcf file.

vcf • 20k views
ADD COMMENT
1
Entering edit mode

most people write their own script to parse data like that in a VCF file.

ADD REPLY
0
Entering edit mode

My lab is using the NextGene free trial and we have managed to align the sequencing run and can see the variants on screen. Hovering over the variant, we see the specific data. However, we cannot get the report that lists all the variants detected with the variant frequency, coverage, etc. Can anybody help? How do we achieve the report?? Thanks

ADD REPLY
13
Entering edit mode
7.3 years ago
Len Trigg ★ 1.6k

RTG Tools includes a vcfstats command that outputs basic statistics for every sample (or you can request specific samples). e.g. on a small simulated vcf:

$ rtg vcfstats family.vcf.gz

Location                     : family.vcf.gz
Failed Filters               : 0
Passed Filters               : 144

Sample Name: sm_mom
SNPs                         : 91
MNPs                         : 1
Insertions                   : 5
Deletions                    : 2
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 44
SNP Transitions/Transversions: 1.74 (73/42)
Total Het/Hom ratio          : 2.96 (74/25)
SNP Het/Hom ratio            : 2.79 (67/24)
MNP Het/Hom ratio            : - (1/0)
Insertion Het/Hom ratio      : 4.00 (4/1)
Deletion Het/Hom ratio       : - (2/0)
Indel Het/Hom ratio          : - (0/0)
Insertion/Deletion ratio     : 2.50 (5/2)
Indel/SNP+MNP ratio          : 0.08 (7/92)

Sample Name: sm_dad
SNPs                         : 73
MNPs                         : 2
Insertions                   : 2
Deletions                    : 3
Indels                       : 0
Same as reference            : 1
Missing Genotype             : 63
SNP Transitions/Transversions: 1.87 (58/31)
Total Haploid                : 19
Haploid SNPs                 : 17
Haploid MNPs                 : 0
Haploid Insertions           : 1
Haploid Deletions            : 1
Haploid Indels               : 0
Total Het/Hom ratio          : 2.59 (44/17)
SNP Het/Hom ratio            : 2.50 (40/16)
MNP Het/Hom ratio            : 1.00 (1/1)
Insertion Het/Hom ratio      : - (1/0)
Deletion Het/Hom ratio       : - (2/0)
Indel Het/Hom ratio          : - (0/0)
Insertion/Deletion ratio     : 0.67 (2/3)
Indel/SNP+MNP ratio          : 0.07 (5/75)

[...]
ADD COMMENT
3
Entering edit mode
7.3 years ago
William ★ 5.1k

You can use SnpSift:

cat variants.vcf | java -jar SnpSift.jar filter " ( countHet() == 2 )" | grep -v '#' | wc -l

instead of just countHet() you can create any complex filter function using the VCF fields and the functions build into SnpSift.

See also the SnpSift webpage: http://snpeff.sourceforge.net/SnpSift.html#filter

To get the total number of variants in your vcf file just do grep -v '#' variants.vcf | wc -l

ADD COMMENT
2
Entering edit mode
7.3 years ago
kautilya ▴ 430

You could use the vcf-stats utility in vcftools suite to compile these statistics for you.

Usage:

vcf-stats input.vcf
ADD COMMENT
1
Entering edit mode

the statistics provided by vcf-stats are confusing. the sum of SNPs and indel is greater than total variants.

ADD REPLY
0
Entering edit mode

If that was the case, it appears to be correct in bcftools stats v1.9

ADD REPLY
1
Entering edit mode
3.4 years ago
VBer ▴ 170

Hi! You can try bcftools stats. Also vcflib's vcfstats. bcftools stats can also do comparisons and gives a much more detailed output that vcflib vcfstats.

Edit: ALSO. If you try and use both vcfstats and bcftools stats, your stats won't be quite the same. I'm trying to figure out how the code differs between the tools. When I do, I'll let you know! (If you wanted to know in the first place ;) )

ADD COMMENT

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6