I mapped Forward and Reverse reads from one plant genome against a reference fasta sequence. As expected, there are varaints (SNP and indels). I am interested in retrieving the consensus sequence for downstream analysis. The reads come from a diploid plant genome and the gene of interest is likely in single copy. However, my plant genome is clearly not 100% homozygous. So, in the consensus sequence I should see homozygous variants and heterozygous variants coded with IUAPC letters. I ran this codes and everything worked well:
samtools mpileup -d8000 -A -uf ref.fasta sample.sorted.bam > sample.mpileup bcftools call -mv -Oz -o sample.calls.vcf.gz sample.mpileup tabix sample.calls.vcf.gz cat ref.fasta | bcftools consensus sample.calls.vcf.gz > sample.consensus.fasta
The only problem is that I have no heterozygous site in the consensus sequence despite I clearly see some in the raw mapping alignment. How to get heterozygous site codes with IUAPC in the consensus sequence ?
Ps: I tried the -i option in bcftools consensus but then all variant sites were codes heterozygous. Also the coverage is quite OK (20x)