9.2 years ago
William ★ 5.1k

Does anyone know the definitions of the fields that vcftools vcf-stats returns:

  • hom_AA = ?
  • het_RA =?
  • snp_count = total count SNPs
  • count = SNP+missing+Ref?
  • hom_RR_count = ?
  • ref = reference calls?
  • missing = missing calls
  • private = private calls not found in other samples
  • het_AA_count = ?
  • ref_count = ? reference calls (same as ref?)
  • unphased = unphased snps

The AA, RA and RR confuse me. I thought they might stand for reference and alternative but why is there a hom_RR count, het_AA_count, ref and ref_count field?

Or does anyone know another tools that also provides the total SNP counts per strains and the private SNP counts per strain?

9.2 years ago
dangenet ▴ 90

hom_AA = homozygous for a single alternate allele (eg. both alleles have the same mutation).

het_AA = both alleles are non-reference but they are not the same allele (e.g. one has the S98A mutation and the other has the L206P mutation). Think of it as het_A1A2 if that helps.

hom_RR = homozygous reference

het_RA = one reference allele, one alternate allele

"counts" appear to be raw counts at a given locus. The documentation isn't totally clear.

EDIT: My vcf-stats is broken ATM, so I'm having trouble getting more details.


