Question

bcftools consensus

0

Entering edit mode

2.4 years ago

Giulia.cosenza ▴ 100

Good morning, I’m new in bioinformatics and I need some help with a task . I’m trying to create a consensus file fasta from a sorted.marked.bam with bcftools. The command I’m using is:

call variants

bcftools mpileup -B -Ou -f reference.fa alignments.bam | bcftools call -mv -M -Oz -o calls.vcf.gz

bcftools index calls.vcf.gz

normalize indels

bcftools norm -f reference.fa calls.vcf.gz -Ob -o calls.norm.bcf

filter adjacent indels within 5bp

bcftools filter --IndelGap 5 calls.norm.bcf - Ob -o calls.norm.flt-indels.bcf

apply variants to create consensus sequence

cat reference.fa | bcftools consensus -H I -M calls.vcf.gz > consensus.fa

My doubt is how can I save information about Heterozygous positions?

If in a specific position there are multiple ALTs with different frequencies which one will be insert in the consensus sequence?

What I mean is that if I have a situation like this

position:100   REF:C  ALT:C,G   count(C):39 count(G):1

I can easily say that the correct nucleotide is C and that will be the nucleotide I see in the consensus.

While if I have a situation like this:

position:100        REF:C  ALT:C,G   count(C):7 count(G):7

It’s probably an Heterozygous and in the consensus I should find the S according to the IUPAC annotation for C or G bases.

But what if I have something like this:

position:100        REF:C  ALT:C,G   count(C):9 count(G):5

How does bcftools Heterozygous position from sequencing errors? Does it come automatically or should I insert some parameters ?

Thank you for your help.

bcftools heterozygous fasta mpileup consensus • 1.8k views

ADD COMMENT • link updated 2.4 years ago by Istvan Albert 100k • written 2.4 years ago by Giulia.cosenza ▴ 100

score 1 · Answer 1 · 2021-11-23

1

Entering edit mode

2.4 years ago

Istvan Albert 100k

If you have a VCF file, it is the job of the VCF file to only contain valid variants filtered by some attributes. The job of the consensus generator to take variants and apply them to the genome.

for more tips on generating consensus see:

VCF to fasta incorporating heterozygous sites

ADD COMMENT • link 2.4 years ago by Istvan Albert 100k