Question: VCFs appear inconsistent with view of alignment
0
gravatar for mark.rose
27 days ago by
mark.rose40
United States
mark.rose40 wrote:

Hi All

I've aligned pacbio ccs reads to a small reference sequence using minimap2 and viewed the result in IGV. The alignments look good, with generally very few discrepancies with the reference. There is one region, however where more variations are observed and that is at a string of 13 Cs. This is not unexpected but when viewing it in IGV I'm actually impressed by how minimal it appears to be.

enter image description here

Note that the reads aligned in this snapshot are typical of the rest of the alignment as well and that the center line marks the beginning of the polyC string

Subsequently I attempted to call variants on this bam file using a variety of variant caller, callvariants (BBMAP), freebayes, and HaplotypeCaller (gatk). All call a 1bp deletion of a C at the poly C string.

callvariants

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MIR604-F2-M2-1A.ccs.10pct.fastq-vs-MIR604.5901-10938.fa.minimap2.sort
MIR604.5901-10938       1401    .       Tc      T       23.72   PASS    SN=0;STA=1401;STO=1402;TYP=DEL;R1P=340;R1M=281;R2P=0;R2M=0;AD=621;DP=1000;MCOV=-1;PPC=0;AF=0.6210;RAF=0.6210;LS=3013780;MQS=37260;MQM=60;BQS=21508;BQM=41;EDS=960160;EDM=4471;IDS=619759;IDM=999;NVC=0;FLG=0;CED=601;HMP=6;SB=0.9997  GT:DP:AD:AF:RAF:NVC:FLG:SB:SC:PF        1:1000:621:0.6210:0.6210:0:0:0.9997:23.72:PASS


freebayes

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  unknown
MIR604.5901-10938       1401    .       TC      T       20562.3 .       AB=0.578755;ABP=61.8389;AC=2;AF=0.666667;AN=3;AO=632;CIGAR=1M1D;DP=1092;DPRA=0;EPP=10.9266;EPPR=20.4059;GTI=0;HWE=-0;LEN=1;MEANALT=20;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=371.527;PAIRED=0;PAIREDR=0;RO=364;RPP=626.539;RPPR=393.971;RUN=1;SAP=16.2178;SRP=20.4059;TYPE=del;XAI=0.00135563;XAM=0.00142991;XAS=7.42841e-05;XRI=0.00134239;XRM=0.00140067;XRS=5.82842e-05;BVAR  GT:DP:RO:QR:AO:QA       0/1/1:1092:364:32027:632:23740

HaplotypeCaller

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MIR604-F2-M2-1A
MIR604.5901-10938       1401    .       TC      T       2899.01 .       AC=1;AF=1.00;AN=1;BaseQRankSum=0.922;DP=342;FS=5.456;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=8.48;ReadPosRankSum=-0.229;SOR=0.463      GT:AD:DP:GQ:PL  1:119,223:342:99:2909,0

All three variant callers call a 1 bp deletion at the homopolymer. What is curious though is the apparent frequency of the alternate allele each of these tools perceive compared to what is visible in IGV. IGV indicates that variants at this position are relatively rare (1-2%) where as each of these callers is reporting the variant is observed in the majority of reads (~60%). Any ideas why?

Thanks

igv bbmap freebayes gatk vcf • 105 views
ADD COMMENTlink modified 27 days ago • written 27 days ago by mark.rose40

Please use the code button (101010) to highlight code and data examples and make sure you use the image button to embed images. You have to post the full image link including the suffix, here that is https://i.ibb.co/DpTp5Vr/poly-C-variants.png which needs to be pasted into the image button field. I did it for you this time. Thanks!

ADD REPLYlink written 27 days ago by ATpoint40k

Actually I tried doing both but they never appeared correctly in the preview window so I must be doing something wrong

ADD REPLYlink written 27 days ago by mark.rose40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1311 users visited in the last hour