Question: Tuning Consensus Calling with bcftools/samtools
gravatar for nickp60
3.1 years ago by
nickp6030 wrote:

I mapping fasta reads (200bp) to a short (~5kb) reference that has N's in it. Coverage is very low (1-4x), as this isn't NGS data, but rather a consensus calling project.

The procedure, in short, is as follows:

# index reference
smalt index reference.fasta reference.fasta 
# map reads to reference, sort, save as bam
smalt map reference.fasta read.fasta | samtools sort - > sorted.bam
# make pileup, call consensus, and index
bcftools mpileup -Ou -f reference.fasta sorted.bam | bcftools call -P 9.9e-1 -mv -Oz -o calls.vcf.gz ; tabix  calls.vcf.gz
# get consensus 
 cat reference.fasta | bcftools consensus calls.vcf.gz > consensus.fa

The issue I am having is that none of the N's are replaced, even when there is coverage. SNPs are called when applicable, but none of the N's are resolved. I am assuming that it does that because there is only a 1x coverage.

Here is a snippet of the result of samtools mpileup:

1_seq 5403 N 1 c ~

1_seq 5404 N 1 a ~

1_seq 5405 N 1 g ~

1_seq 5406 N 1 g ~

1_seq 5407 N 1 c ~

1_seq 5408 N 1 a ~

1_seq 5409 N 1 t ~

1_seq 5410 C 1 , ~

1_seq 5411 A 1 , ~

1_seq 5412 A 1 , ~

1_seq 5413 A 1 , ~

Question: Give this (non-standard) use case, is there a way to tune the consensus calling parameters to favor replacement of N's?

Thanks in advance!

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by nickp6030
gravatar for nickp60
3.1 years ago by
nickp6030 wrote:

Well, having heard back nothing, here is the Python workaround I am using. It needs work, but if anyone else finds themselves in a similar situation, it may be a good starting point. This is a VERY crude consensus caller, where N's are replaced when there is uncontested resolution, and reference bases are replaced with mapped sequence when unanimous. Weighting/qual scores are ignored, and only the most cursory handling of indels is implemented. Github Gist for crude mpileup consensus calling. That being said, Any further direction is appreciated!

ADD COMMENTlink written 3.1 years ago by nickp6030
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2289 users visited in the last hour