I am trying to determine the level of heterozygosity in my de-novo assembled genome, in order to do this I think I should measure the snp frequency. So far what I have done is called SNPs using my final assembly and then converted to VCF using below:
bcftools mpileup -Ou -f ../scaffolds.fa sorted.bam | bcftools call -Ou -mv | bcftools norm -Ou -f ../scaffolds.fa > file.vcf
this seems to work fine and I obtain a VCF file, however when I try to view it with "less" the first few lines (header) look normal, but the rest of the lines after the column names are unreadable, for example:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sorted.bam ^@t^@^@^@^K^@^@^@^B^@^@^@^@^@^@^@^A^@^@^@<8F>j^XB^N^@^B^@^A^@^@^B^G^WT<87>TGTCGATT^@^Q^A^@^Q^B^Q^B^Q^C^U9<8E>c>^Q^D^Q ^Q^E^U-<F6><DD>6^Q
Not sure if this is normal, but I am unsure on how to use this to get overall snp frequency/infer level of heterozygosity. Any help is greatly appreciated.