Question: Reason for Freebayes Calling Multi-base Variants
0
gravatar for angrypigeon
3.4 years ago by
angrypigeon120
angrypigeon120 wrote:

The question I have is why freebayes reports variants in an output vcf file as multiple bases and an incorrect locus. For instance, below the variant is the same, and taken from the same originating sequence DNA. However, when I create a fastq file containing only reads that span the chr4:11054232 locus freebayes report the variant at 11054232 and a C to T change. If I call variants from the original fastq file containing all sequenced reads, the variant is reported at 110541228 and a CAGCC to CAGCT change.

So my question is why this occurs. What is the purpose of adding other bases into the call. And can I control this somehow, and only get the single base that is altered.

chr4 110541232 . C T 6083.73 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=262;CIGAR=1X;DP=262;DPB=262;DPRA=0;EPP=563.283;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;

chr4 110541228 . CAGCC CAGCT 133177 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=5693;CIGAR=4M1X;DP=5743;DPB=5743;DPRA=0;EPP=12365.2;EPPR=92.0407;GTI=0;LEN=1;MEANALT=5;MQM=60;MQMR=60;

sequencing alignment • 1.6k views
ADD COMMENTlink modified 3.4 years ago by Daniel Swan13k • written 3.4 years ago by angrypigeon120

Freebayes gives you extra local phasing information. At the beginning, HaplotypeCaller reported variants in this way, too, but later the GATK developers thought the conventional way is more convenient to users.

ADD REPLYlink written 3.4 years ago by lh331k
1
gravatar for Daniel Swan
3.4 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

If you only call at a single base, you're only going to get a single base change reported. By default Freebayes reports multinucleotide polymorphisms and 'complex events' which span multiple nucleotides. These reflect cases where you see linked variants, phased by their reads. This is surely a desirable thing to know.

If you have adjacent base changes, that span a single codon, then there are situations where calling these as two separate SNPs from the same read and then trying to deduce their independent effects on the protein sequence is just going to be wrong. In this case you need to count the dinucleotide variant as a linked unit.

If you really must do this, check out vcfallelicprimitives : https://github.com/vcflib/vcflib#vcfallelicprimitives

EDIT: Oops I think I misinterpreted your question. Take a look at : https://github.com/ekg/freebayes/issues/161

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Daniel Swan13k

Thanks for the edit. That seems to be the same issue that I am having, although it is unclear what the solution to the problem is.

ADD REPLYlink written 3.4 years ago by angrypigeon120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 550 users visited in the last hour