After alignment of reads and conversion to BAM I can visualize the existence of a 9-base deletion.
This deletion region is also called correctly by mpileup and bcftools
bcftools mpileup -Ou -f $ref xxx.bam -o newbcfMPILE_xxx
bcftools call newbcfMPILE_xxx --ploidy 1 -mv -Ov -o newbcfMPILE_xxx_haploid.vcf
bcftools call newbcfMPILE_xxx --ploidy 1 -c -Ov | vcfutils vcf2fq > cns_xxx.fq
In consensus sequence this portion is:
ctagtttgtctAgtttGaagcta <--consensus from vcf2fq
ctagtttg---------aagcta <--Expect this because reads with deletions is predominant
...........A....G...... <--mutations in other reads without deletion, which fill in the gaps in the consensus
ctagtttgtctGgtttTaagcta <--REF
In the vcf file I do see these indel mutations with more mutant reads than non-indels.
#CHROM POS REF ALT QUAL INFO
SARSCOV2 11287 GTCTGGTTTT G 228.344 DP=224; DP4=27,1,167,29;MQ=54
SARSCOV2 11288 TCTGGTTTTA T 228.325 DP=205; DP4=15,4,159,27;MQ=54
167+29 = 196 reads out of 224 total show the deletion. Other deletion overlaps except one base at either end, with a similar dominant proportion.
Is there a way I can generate consensus with the deleted portion removed (or filled in with ---------) instead of nucleotides from the minority reads?
I have an alternative python script to count bases at each position and call consensus, but it does not have the statistics as samtools/bcftools/vcfutils.