I am trying to calculate per gene nucleotide diversity (pi) for whole-genome-sequence data. I basically have whole genome resequenced data for many hundred individuals with ~1.2 million SNPs and a well annotated species with 36k genes. I was wondering if there is a method that would calculate per gene nucleotide diversity for whole genome sequencing data, ideally from a VCF file and in command line?
So far, I tried calculating pi with vcftools - - window-pi as well as - - site-pi but the window approach is not useful as my genes do not regularly distribute along windows and gaps. For the - -site-pi there is no explanation on how it is calculated and more problematically it does calculate more positions than exist. Another option was DnaSP6, but here I would need to produce vcf files for each gene and as piping or merging is not possible, all files would need to be uploaded manually.