Hi all,
I have a .vcf
file I generated using bcftools mpileup
and then filtered to retain positions of interest. I now want to generate a consensus sequence using bcftools consensus
. The issue I've had so far is that positions in the .vcf
file that are missing relative to the reference genome are automatically just assigned the reference base. I would like these bases to be instead assigned an N.
Consider an example where I have a 10000 bp genome against which I've called variants. The .vcf
file I've generated has base calls from positions 11 to 10000. In the consensus genome I would like bases 1–10 to be assigned an N. However, these bases are currently being automatically assigned the reference base instead. The command I have tried is:
bcftools consensus -i "FORMAT/AD > 10 & QUAL > 20 & (INFO/AD[1]/INFO/DP) > 0.25" -p test_consensus_ -o test.consensus.fa -M N all.vcf.bgz
What options should I instead be using with bcftools consensus
to get the desired consensus sequence? Since I'm implementing the variant calling within a pipeline and the coordinates of included positions will change between samples, I'd prefer to not have to create a masking file for each sample and then mask the consensus sequence post-hoc with the -m
option. I imagine it might be possible to do what I want using the -i
option, but I don't know what expression to use.
Thanks!