Deletions are missed from consensus sequence even when they are predominant. Is this common to vcf2fq of vcfutils?
Entering edit mode
15 months ago
malaya77 • 0

After alignment of reads and conversion to BAM I can visualize the existence of a 9-base deletion.

This deletion region is also called correctly by mpileup and bcftools

bcftools mpileup -Ou -f $ref xxx.bam -o newbcfMPILE_xxx
bcftools call newbcfMPILE_xxx --ploidy 1 -mv -Ov -o newbcfMPILE_xxx_haploid.vcf
bcftools call newbcfMPILE_xxx --ploidy 1 -c -Ov | vcfutils vcf2fq > cns_xxx.fq

In consensus sequence this portion is:

ctagtttgtctAgtttGaagcta  <--consensus from vcf2fq
ctagtttg---------aagcta  <--Expect this because reads with deletions is predominant
...........A....G......  <--mutations in other reads without deletion, which fill in the gaps in the consensus

ctagtttgtctGgtttTaagcta  <--REF

In the vcf file I do see these indel mutations with more mutant reads than non-indels.

#CHROM      POS     REF        ALT  QUAL    INFO
SARSCOV2    11287   GTCTGGTTTT  G   228.344 DP=224; DP4=27,1,167,29;MQ=54
SARSCOV2    11288   TCTGGTTTTA  T   228.325 DP=205; DP4=15,4,159,27;MQ=54

167+29 = 196 reads out of 224 total show the deletion. Other deletion overlaps except one base at either end, with a similar dominant proportion.

Visualization of the region of BAM file in Tablet. The asterisks are deletions.

Is there a way I can generate consensus with the deleted portion removed (or filled in with ---------) instead of nucleotides from the minority reads?

I have an alternative python script to count bases at each position and call consensus, but it does not have the statistics as samtools/bcftools/vcfutils.

vcf2fq consensus bcftools indel • 716 views

Login before adding your answer.

Traffic: 1314 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6