Determine level of heterozygosity
1
1
Entering edit mode
4.6 years ago
max_19 ▴ 170

Hi all,

I am trying to determine the level of heterozygosity in my de-novo assembled genome, in order to do this I think I should measure the snp frequency. So far what I have done is called SNPs using my final assembly and then converted to VCF using below:

bcftools mpileup -Ou -f ../scaffolds.fa sorted.bam | bcftools call -Ou -mv | bcftools norm -Ou -f ../scaffolds.fa > file.vcf

this seems to work fine and I obtain a VCF file, however when I try to view it with "less" the first few lines (header) look normal, but the rest of the lines after the column names are unreadable, for example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sorted.bam ^@t^@^@^@^K^@^@^@^B^@^@^@^@^@^@^@^A^@^@^@<8F>j^XB^N^@^B^@^A^@^@^B^G^WT<87>TGTCGATT^@^Q^A^@^Q^B^Q^B^Q^C^U9<8E>c>^Q^D^Q ^Q^E^U-<F6><DD>6^Q

Not sure if this is normal, but I am unsure on how to use this to get overall snp frequency/infer level of heterozygosity. Any help is greatly appreciated.

Thank you.

heterozygous genome SNP frequency vcf • 1.1k views
ADD COMMENT
0
Entering edit mode
4.6 years ago
Brice Sarver ★ 3.8k

Your options (-Ou) have you outputting an uncompressed BCF (binary VCF), hence the inability to read. In your last call to bcftools norm, pass the -Ov option for an uncompressed VCF or -Oz for a compressed one. You can also specify a file name with -o as opposed to redirecting stdout.

For more info, see the bcftools manual here.

ADD COMMENT

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6