Hi all,
I have a .vcf file I generated using bcftools mpileup and then filtered to retain positions of interest. I now want to generate a consensus sequence using bcftools consensus. The issue I've had so far is that positions in the .vcf file that are missing relative to the reference genome are automatically just assigned the reference base. I would like these bases to be instead assigned an N.
Consider an example where I have a 10000 bp genome against which I've called variants. The .vcf file I've generated has base calls from positions 11 to 10000. In the consensus genome I would like bases 1–10 to be assigned an N. However, these bases are currently being automatically assigned the reference base instead. The command I have tried is:
bcftools consensus -i "FORMAT/AD > 10 & QUAL > 20 & (INFO/AD[1]/INFO/DP) > 0.25" -p test_consensus_ -o test.consensus.fa -M N all.vcf.bgz
What options should I instead be using with bcftools consensus to get the desired consensus sequence? Since I'm implementing the variant calling within a pipeline and the coordinates of included positions will change between samples, I'd prefer to not have to create a masking file for each sample and then mask the consensus sequence post-hoc with the -m option. I imagine it might be possible to do what I want using the  -i option, but I don't know what expression to use.
Thanks!