Clarification for bcftools consensus
1
0
Entering edit mode
3.9 years ago
cfos4698 ★ 1.1k

Hi all,

I have a .vcf file I generated using bcftools mpileup and then filtered to retain positions of interest. I now want to generate a consensus sequence using bcftools consensus. The issue I've had so far is that positions in the .vcf file that are missing relative to the reference genome are automatically just assigned the reference base. I would like these bases to be instead assigned an N.

Consider an example where I have a 10000 bp genome against which I've called variants. The .vcf file I've generated has base calls from positions 11 to 10000. In the consensus genome I would like bases 1–10 to be assigned an N. However, these bases are currently being automatically assigned the reference base instead. The command I have tried is:

bcftools consensus -i "FORMAT/AD > 10 & QUAL > 20 & (INFO/AD[1]/INFO/DP) > 0.25" -p test_consensus_ -o test.consensus.fa -M N all.vcf.bgz

What options should I instead be using with bcftools consensus to get the desired consensus sequence? Since I'm implementing the variant calling within a pipeline and the coordinates of included positions will change between samples, I'd prefer to not have to create a masking file for each sample and then mask the consensus sequence post-hoc with the -m option. I imagine it might be possible to do what I want using the -i option, but I don't know what expression to use.

Thanks!

bcftools vcf • 1.6k views
ADD COMMENT
1
Entering edit mode
14 months ago
cfos4698 ★ 1.1k

I thought I should revisit and answer my own question. In the absence of a better way, I ended up taking the route of creating a mask for each sample, subtracting variants from the mask (so that deletions with 'missing' coverage aren't excluded), then generating the consensus. Here's an example of the code I took from my Snakemake pipeline.

bcftools query -f'%CHROM\t%POS0\t%END\n' {input.vcf_file} > {output.variants_bed}
bedtools genomecov -bga -ibam {input.bam} | awk '$4 < {params.min_depth}' | \
bedtools subtract -a - -b {output.variants_bed} > {output.mask}
bcftools consensus -p {params.prefix} -f {params.reference} --mark-del '-' -m {output.mask} -H I -i 'INFO/DP >= {params.min_depth} & GT!="mis"' {input.vcf_file} 2> {log} | \
sed "/^>/s/{params.prefix}.*/{params.prefix}/" > {output.consensus}
ADD COMMENT

Login before adding your answer.

Traffic: 889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6