Variant calling on bacterial population using Freebayes gave wrong allele frequency and extremely low QUAL
Entering edit mode
2.0 years ago
xuye.july ▴ 10

I am working on genomic data of 8 evolving bacterial populations. Each one of them is a community of millions non-clonal individuals and was pooled together for sequencing. So I got 8 data back from illumina Mi-seq 300bp PE platform. After a series of quality control steps such as reads trimming, I mapped my reads to the reference genome by Bowtie2 and got 8 sorted bam files for each mapping.

Then I used freebayes (version 1.0.2) to call variants for each of my population. I used population setting "--pooled-continuous" and was looking for the allele frequency from my outout instead of genotype as the genotype didn't make much sense in my pooled population sample.

But in the output (see example below), the AF (allele frequency) is somehow not equal to AO/DP calculation. Furthermore, when I check the QUAL column of the output, only the "fixed" SNP or INDEL (AF = 1, or AO == DP) had high quality (i.e. QUAL > 30). All other SNPs and INDELS with alternate allele count less than the read depth (AO < DP or AF < 1) gave an extremely bad QUAL (either 0 or some extremely small number close to 0). I visualized some of these polymorphisms with "bad" QUAL in IGV and checked some other metrics such as the base quality, the average mapping quality, the strand bias, and all these seemed reasonably good to me. I'm wondering why this happened and should I still apply the filtration on QUAL?

Here is one example of the variant calls that showed bad QUAL:

AL645882.2 57219 . T G 0 PASS AB=0;ABP=0;AC=0;AF=0;AN=1;AO=2;CIGAR=1X;DP=16;DPB=16;DPRA=0;EPP=3.0103;EPPR=5.49198;GTI=0;LEN=1;MEANALT=1;MQM=42;MQMR=42;NS=1;NUMALT=1;ODDS=87.2098;PAIRED=0.5;PAIREDR=0.857143;PAO=0;PQA=0;PQR=0;PRO=0;QA=20;QR=497;RO=14;RPL=1;RPP=3.0103;RPPR=8.59409;RPR=1;RUN=1;SAF=0;SAP=7.35324;SAR=2;SRF=1;SRP=25.3454;SRR=13;TYPE=snp GT:DP:RO:QR:AO:QA:GL 0:16:14:497:2:20:0,-41.8659

As you can see that: First, even though the alternate allele count (AO) is 2 and the total depth(DP) is 16, the alternate allele freq (AF) is not 2/16= 0.125. Second, the QUAL is 0. The mean mapping quality of alternate allele (MQM) is 42, and the mean mapping quality of ref allele (MQMR) is also 42, in terms of Phred score, neither is poor. I am wondering if anything wrong from the beginning of my variant calling and how do people experienced with freebayes interpret results like above? How does the tool calculate the quality of the variant calling?

In addition, what does the final number "-41.8659" in above result mean? I couldn't find any info about it anywhere.



next-gen snp genome • 1.5k views
Entering edit mode
20 months ago
baoyl818 • 0

Hi,dear friend:
I met the similar problem. I use Freebayes to call snp/indel from nanopore data( the bam was generated from minimap2), and then the output vcf file is very odd. The value of QUAL column is extremely low, most of them is nearly zero.
The command line is like this : freebayes -f hg19.fasta --legacy-gls --genotype-qualities sample.bam > sample.freebayes.vcf
the bam's region is : chr6:32006093-32009447

the output vcf is like this : 6 32035018 . GCCAG GCCCAG 0.0675426 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1M1I4M;DP=2;DPB=2.4;DPRA=0;EPP=7.35324;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=4.15888;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=2;RPP=7.35324;RPPR=0;RPR=0;RUN=1;SAF=2;SAP=7.35324;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=ins GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/0:18.1158:2:0,2:0:0:2:0:0,-0.60206,0

do you solve your problem ?


Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6