Question: Variant calling on bacterial population using Freebayes gave wrong allele frequency and extremely low QUAL
gravatar for xuye.july
4 weeks ago by
xuye.july0 wrote:

I am working on genomic data of 8 evolving bacterial populations. Each one of them is a community of millions non-clonal individuals and was pooled together for sequencing. So I got 8 data back from illumina Mi-seq 300bp PE platform. After a series of quality control steps such as reads trimming, I mapped my reads to the reference genome by Bowtie2 and got 8 sorted bam files for each mapping.

Then I used freebayes (version 1.0.2) to call variants for each of my population. I used population setting "--pooled-continuous" and was looking for the allele frequency from my outout instead of genotype as the genotype didn't make much sense in my pooled population sample.

But in the output (see example below), the AF (allele frequency) is somehow not equal to AO/DP calculation. Furthermore, when I check the QUAL column of the output, only the "fixed" SNP or INDEL (AF = 1, or AO == DP) had high quality (i.e. QUAL > 30). All other SNPs and INDELS with alternate allele count less than the read depth (AO < DP or AF < 1) gave an extremely bad QUAL (either 0 or some extremely small number close to 0). I visualized some of these polymorphisms with "bad" QUAL in IGV and checked some other metrics such as the base quality, the average mapping quality, the strand bias, and all these seemed reasonably good to me. I'm wondering why this happened and should I still apply the filtration on QUAL?

Here is one example of the variant calls that showed bad QUAL:

AL645882.2 57219 . T G 0 PASS AB=0;ABP=0;AC=0;AF=0;AN=1;AO=2;CIGAR=1X;DP=16;DPB=16;DPRA=0;EPP=3.0103;EPPR=5.49198;GTI=0;LEN=1;MEANALT=1;MQM=42;MQMR=42;NS=1;NUMALT=1;ODDS=87.2098;PAIRED=0.5;PAIREDR=0.857143;PAO=0;PQA=0;PQR=0;PRO=0;QA=20;QR=497;RO=14;RPL=1;RPP=3.0103;RPPR=8.59409;RPR=1;RUN=1;SAF=0;SAP=7.35324;SAR=2;SRF=1;SRP=25.3454;SRR=13;TYPE=snp GT:DP:RO:QR:AO:QA:GL 0:16:14:497:2:20:0,-41.8659

As you can see that: First, even though the alternate allele count (AO) is 2 and the total depth(DP) is 16, the alternate allele freq (AF) is not 2/16= 0.125. Second, the QUAL is 0. The mean mapping quality of alternate allele (MQM) is 42, and the mean mapping quality of ref allele (MQMR) is also 42, in terms of Phred score, neither is poor. I am wondering if anything wrong from the beginning of my variant calling and how do people experienced with freebayes interpret results like above? How does the tool calculate the quality of the variant calling?

In addition, what does the final number "-41.8659" in above result mean? I couldn't find any info about it anywhere.



snp next-gen genome • 118 views
ADD COMMENTlink written 4 weeks ago by xuye.july0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1811 users visited in the last hour