Question: Generating consensus sequences
gravatar for Lina F
2.2 years ago by
Lina F150
Boston, MA
Lina F150 wrote:

Hi all,

I am interested in generating consensus sequences from a bwa alignment. I generated the consensus sequences as follows:

samtools mpileup -uf reference.fasta sorted.bam -d 8000 | bcftools call -mv -Oz -o sorted.vcf.gz
tabix sorted.vcf.gz
cat reference.fasta | bcftools consensus sorted.vcf.gz > consensus.fasta

However, I noticed that even for sequences in my reference where none of the reads matched, I got a potential "consensus" sequence back. I assume this is because there were no entries in the VCF file that indicated SNPs, which could mean a perfect alignment.

However, this is misleading in my case, because if nothing mapped to a sequence in the reference set then I don't want to be confused with a consensus sequence. On the other hand, if a lot of reads mapped perfectly to a sequence in the reference set, then of course I'd like to see the correct consensus sequence.

How would you recommend I modify this approach?

Is it better to filter the bam somehow to only keep things that have a certain mapping depth? Or should I filter the VCF?

Thanks in advance!

bwa vcftools fasta consensus • 1.3k views
ADD COMMENTlink modified 2.2 years ago by vmicrobio240 • written 2.2 years ago by Lina F150
gravatar for vmicrobio
2.2 years ago by
vmicrobio240 wrote:

you may try in parallel to remove non-aligned sequences:

samtools view -b -F4 sorted.bam > sorted_align.bam
ADD COMMENTlink written 2.2 years ago by vmicrobio240

Thank you for the suggestion, I will try that!

ADD REPLYlink written 2.2 years ago by Lina F150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour