Determine level of heterozygosity
4.4 years ago
max_19 ▴ 170

Hi all,

I am trying to determine the level of heterozygosity in my de-novo assembled genome, in order to do this I think I should measure the snp frequency. So far what I have done is called SNPs using my final assembly and then converted to VCF using below:

bcftools mpileup -Ou -f ../scaffolds.fa sorted.bam | bcftools call -Ou -mv | bcftools norm -Ou -f ../scaffolds.fa > file.vcf

this seems to work fine and I obtain a VCF file, however when I try to view it with "less" the first few lines (header) look normal, but the rest of the lines after the column names are unreadable, for example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sorted.bam ^@t^@^@^@^K^@^@^@^B^@^@^@^@^@^@^@^A^@^@^@<8F>j^XB^N^@^B^@^A^@^@^B^G^WT<87>TGTCGATT^@^Q^A^@^Q^B^Q^B^Q^C^U9<8E>c>^Q^D^Q ^Q^E^U-<F6><DD>6^Q

Not sure if this is normal, but I am unsure on how to use this to get overall snp frequency/infer level of heterozygosity. Any help is greatly appreciated.

Thank you.

heterozygous genome SNP frequency vcf • 1.0k views
4.4 years ago
Brice Sarver ★ 3.8k

Your options (-Ou) have you outputting an uncompressed BCF (binary VCF), hence the inability to read. In your last call to bcftools norm, pass the -Ov option for an uncompressed VCF or -Oz for a compressed one. You can also specify a file name with -o as opposed to redirecting stdout.

For more info, see the bcftools manual here.


