Question

Clarification for bcftools consensus

0

Entering edit mode

5.0 years ago

cfos4698 ★ 1.2k

Hi all,

I have a .vcf file I generated using bcftools mpileup and then filtered to retain positions of interest. I now want to generate a consensus sequence using bcftools consensus. The issue I've had so far is that positions in the .vcf file that are missing relative to the reference genome are automatically just assigned the reference base. I would like these bases to be instead assigned an N.

Consider an example where I have a 10000 bp genome against which I've called variants. The .vcf file I've generated has base calls from positions 11 to 10000. In the consensus genome I would like bases 1–10 to be assigned an N. However, these bases are currently being automatically assigned the reference base instead. The command I have tried is:

bcftools consensus -i "FORMAT/AD > 10 & QUAL > 20 & (INFO/AD[1]/INFO/DP) > 0.25" -p test_consensus_ -o test.consensus.fa -M N all.vcf.bgz

What options should I instead be using with bcftools consensus to get the desired consensus sequence? Since I'm implementing the variant calling within a pipeline and the coordinates of included positions will change between samples, I'd prefer to not have to create a masking file for each sample and then mask the consensus sequence post-hoc with the -m option. I imagine it might be possible to do what I want using the -i option, but I don't know what expression to use.

Thanks!

bcftools vcf • 3.2k views

ADD COMMENT • link 2.3 years ago by cfos4698 ★ 1.2k

score 1 · Accepted Answer · 2023-07-25

I thought I should revisit and answer my own question. In the absence of a better way, I ended up taking the route of creating a mask for each sample, subtracting variants from the mask (so that deletions with 'missing' coverage aren't excluded), then generating the consensus. Here's an example of the code I took from my Snakemake pipeline.

bcftools query -f'%CHROM\t%POS0\t%END\n' {input.vcf_file} > {output.variants_bed}
bedtools genomecov -bga -ibam {input.bam} | awk '$4 < {params.min_depth}' | \
bedtools subtract -a - -b {output.variants_bed} > {output.mask}
bcftools consensus -p {params.prefix} -f {params.reference} --mark-del '-' -m {output.mask} -H I -i 'INFO/DP >= {params.min_depth} & GT!="mis"' {input.vcf_file} 2> {log} | \
sed "/^>/s/{params.prefix}.*/{params.prefix}/" > {output.consensus}