The current version of bcftools (v1.3) will store metadata in the index file when using bcftools index and then allow you to access either the total variant count or the number of variants per contig just by reading the index.
print the number of records based on the CSI or TBI index files
Print per contig stats based on the CSI or TBI index files. Output format is three tab-delimited columns listing the contig name, contig length (. if unknown) and number of records for the contig. Contigs with zero records are not printed.
bcftools index some.vcf.gz # create the tbi
bcftools index --nrecords some.vcf.gz # get total variant count
bcftools index --stats some.vcf.gz # get variant count per chromsome
We are wondering if there's a fast approach to use tabix or any other hack to get the total number of variants that a VCF has without actually reading the whole vcf file and counting the lines. I assume that the total number of rows is somehow stored in the tbi file.
No, this is not stored in the traditional tabix index. The htslib implementation of tabix should have this information in dummy bins, I think.
Someone else needs to confirm, though.