I am working on genomic data of 8 evolving bacterial populations. Each one of them is a community of millions non-clonal individuals and was pooled together for sequencing. So I got 8 data back from illumina Mi-seq 300bp PE platform. After a series of quality control steps such as reads trimming, I mapped my reads to the reference genome by Bowtie2 and got 8 sorted bam files for each mapping.
Then I used freebayes (version 1.0.2) to call variants for each of my population. I used population setting "--pooled-continuous" and was looking for the allele frequency from my outout instead of genotype as the genotype didn't make much sense in my pooled population sample.
But in the output (see example below), the AF (allele frequency) is somehow not equal to AO/DP calculation. Furthermore, when I check the QUAL column of the output, only the "fixed" SNP or INDEL (AF = 1, or AO == DP) had high quality (i.e. QUAL > 30). All other SNPs and INDELS with alternate allele count less than the read depth (AO < DP or AF < 1) gave an extremely bad QUAL (either 0 or some extremely small number close to 0). I visualized some of these polymorphisms with "bad" QUAL in IGV and checked some other metrics such as the base quality, the average mapping quality, the strand bias, and all these seemed reasonably good to me. I'm wondering why this happened and should I still apply the filtration on QUAL?
Here is one example of the variant calls that showed bad QUAL:
AL645882.2 57219 . T G 0 PASS AB=0;ABP=0;AC=0;AF=0;AN=1;AO=2;CIGAR=1X;DP=16;DPB=16;DPRA=0;EPP=3.0103;EPPR=5.49198;GTI=0;LEN=1;MEANALT=1;MQM=42;MQMR=42;NS=1;NUMALT=1;ODDS=87.2098;PAIRED=0.5;PAIREDR=0.857143;PAO=0;PQA=0;PQR=0;PRO=0;QA=20;QR=497;RO=14;RPL=1;RPP=3.0103;RPPR=8.59409;RPR=1;RUN=1;SAF=0;SAP=7.35324;SAR=2;SRF=1;SRP=25.3454;SRR=13;TYPE=snp GT:DP:RO:QR:AO:QA:GL 0:16:14:497:2:20:0,-41.8659
As you can see that: First, even though the alternate allele count (AO) is 2 and the total depth(DP) is 16, the alternate allele freq (AF) is not 2/16= 0.125. Second, the QUAL is 0. The mean mapping quality of alternate allele (MQM) is 42, and the mean mapping quality of ref allele (MQMR) is also 42, in terms of Phred score, neither is poor. I am wondering if anything wrong from the beginning of my variant calling and how do people experienced with freebayes interpret results like above? How does the tool calculate the quality of the variant calling?
In addition, what does the final number "-41.8659" in above result mean? I couldn't find any info about it anywhere.
Thanks!
Ye