Bcftools consensus generates mismatched consensus sequence
0
0
Entering edit mode
9 weeks ago
Duy • 0

Hi everyone,

Recently, I have been using bcftools consensus to generate consensus sequence as the following commands:

bcftools mpileup -Ou -f ref.fa in.bam | bcftools call -Ou -mv  --ploidy 1 | bcftools norm -f ref.fa -Oz -o norm.vcf.gz
bcftools index norm.vcf.gz
bcftools consensus -f ref.fa -o consensus.fa norm.vcf.gz

However, the generated fasta file has several different bases from the vcf file (I've already double-checked with bam file):

      REF ALT
vcf:   A   T
fasta: A   G
bam:   A   T

I've tried different approaches but only vcfutils.pl vcf2fq generated the correct consensus sequence:

bcftools mpileup -f ref.fa in.bam | bcftools call -c  --ploidy 1 | vcfutils.pl vcf2fq > cns.fastq
seqtk seq -aQ64 –q20 -n N  cns.fastq > cns.fasta

But again, I've ran into an issue which is the lowercase bases in the fasta file. From what I've read, the lowercase bases can be putative indel bases or low-quality bases. Instead of only soft-masking indel base, several bases around the indel site also get masked. For example (the indel site is at 567th base) :

562                 580
  AagacaccccccacagttT

The indel is at 567th base For future work, I need to distinguish between indel bases and low-quality bases. Is there anyway I can convert lowercase bases to something else but only at indel site?

Also, it'd be nice if someone could tell me why bcftools consensus and vcfutils.pl vcf2fq generated different fasta file. If you must know, I'm working with mtDNA samples.

Cheers

Consensus sequence • 151 views
ADD COMMENT

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6