I understand that to help distinguish correctly mapped variant reads, VCF Info filters for strand bias and placement bias can be used - in particular SAP (Strand balance probability for the alternate allele) and EPP (End Placement Probability) which are encoded as Phred-scaled estimates of the probability of deviation from the expected ratio of 0.5, with a suggested cutoff of >20. I am working to identify mutations from 150bp WGS reads from an intron rich, haploid eukaryotic GC rich algae using Freebayes to call variants, and, my control has a well supported and sequenced mutation which is eliminated by these filters, which had scores of 3.3935 for both measures. So a real, high quality variant would be eliminated.
Can you help me understand why this might be the case? Are these filters not appropriate for my data - and if so, why?
In case this helps the VCF data for the mutation is:
QUAL: 2587.33 . INFO: AB=0;ABP=0;AC=1;AF=1;AN=1;AO=51;CIGAR=1M1D3M;DP=51;DPB=40.8;DPRA=0;EPP=3.3935;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=595.754;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=3052;QR=0;RO=0;RPL=25;RPP=3.05288;RPPR=0;RPR=26;RUN=1;SAF=24;SAP=3.3935;SAR=27;SRF=0;SRP=0;SRR=0;TYPE=del;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL 1:51:0,51:0:0:51:3052:-261.674,0
Thank you!