Question

Generating consensus sequences

1

Entering edit mode

7.4 years ago

Lina F ▴ 200

Hi all,

I am interested in generating consensus sequences from a bwa alignment. I generated the consensus sequences as follows:

samtools mpileup -uf reference.fasta sorted.bam -d 8000 | bcftools call -mv -Oz -o sorted.vcf.gz
tabix sorted.vcf.gz
cat reference.fasta | bcftools consensus sorted.vcf.gz > consensus.fasta

However, I noticed that even for sequences in my reference where none of the reads matched, I got a potential "consensus" sequence back. I assume this is because there were no entries in the VCF file that indicated SNPs, which could mean a perfect alignment.

However, this is misleading in my case, because if nothing mapped to a sequence in the reference set then I don't want to be confused with a consensus sequence. On the other hand, if a lot of reads mapped perfectly to a sequence in the reference set, then of course I'd like to see the correct consensus sequence.

How would you recommend I modify this approach?

Is it better to filter the bam somehow to only keep things that have a certain mapping depth? Or should I filter the VCF?

Thanks in advance!

consensus fasta bwa vcftools • 3.0k views

ADD COMMENT • link updated 7.4 years ago by vmicrobio ▴ 290 • written 7.4 years ago by Lina F ▴ 200

score 0 · Answer 1 · 2016-12-02

0

Entering edit mode

7.4 years ago

vmicrobio ▴ 290

you may try in parallel to remove non-aligned sequences:

samtools view -b -F4 sorted.bam > sorted_align.bam