Hi Biostars Leaders,
Freebayes(version:v1.0.1-1-g683b3cc-dirty) defines AF as Description="Estimated allele frequency in the range (0,1]", but the values are always either 0.5 or 1.0, and they are not actual observed frequencies.
I have observed the same with GATK's HaplotypeCaller, and I have custom calculated the actual frequencies from Ref & Alt Alleles i.e. from the AD field.
Freebayes does not spit out the AD field, but it has these following fields which I think I can use : RO = "Reference allele observation count, with partial observations recorded fractionally" AO = "Alternate allele observations, with partial observations recorded fractionally"
I am wondering if there is any advice on how to calculate actual allele frequencies for Freebayes ?
thanks, gsr
Some variant callers assume the reference is the human genome, with a ploidy of 2, and their heuristics act accordingly - so, they give incorrect results for genomes that don't have a ploidy of 2. I have not used Freebayes and I don't know whether it's possible to force it to correctly calculate allele frequencies, but BBMap's CallVariants tool (particularly in conjunction with BBMap for mapping) will correctly calculate and report the variant frequencies for SNPs. It's more complicated for indels, but it will also report an approximation of the correct result for insertions, with accuracy gradually decaying the longer they get with respect to read length (so, indel calls are correct, but the allele frequency correctness decreases with length, differentially between insertions and deletions). For example, 100bp reads would be highly accurate for 30bp insertions and 100kbp deletions, but not for 100kbp insertions.
Brian, thanks for the reply. But, I think my question is still unanswered.