The question I have is why freebayes reports variants in an output vcf file as multiple bases and an incorrect locus. For instance, below the variant is the same, and taken from the same originating sequence DNA. However, when I create a fastq file containing only reads that span the chr4:11054232 locus freebayes report the variant at 11054232 and a C to T change. If I call variants from the original fastq file containing all sequenced reads, the variant is reported at 110541228 and a CAGCC to CAGCT change.
So my question is why this occurs. What is the purpose of adding other bases into the call. And can I control this somehow, and only get the single base that is altered.
chr4 110541232 . C T 6083.73 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=262;CIGAR=1X;DP=262;DPB=262;DPRA=0;EPP=563.283;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
chr4 110541228 . CAGCC CAGCT 133177 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=5693;CIGAR=4M1X;DP=5743;DPB=5743;DPRA=0;EPP=12365.2;EPPR=92.0407;GTI=0;LEN=1;MEANALT=5;MQM=60;MQMR=60;
Freebayes gives you extra local phasing information. At the beginning, HaplotypeCaller reported variants in this way, too, but later the GATK developers thought the conventional way is more convenient to users.