Hello,
I want to understand the heterozygosity of my sample after de novo assembly, so I mapped the reads used for assembly to the draft genome and called SNPs from that with GATK tools. However, some of the lines in the .vcf output are really weird. For example,
scf7180000017060 11256 . G T 577 PASS AC=1;AF=1.00;AN=1;DP=17;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=44.07;QD=33.94;SOR=0.804 GT:AD:DP:GQ:PL 1:0,17:17:99:607,0
As you can see the allele depth (AD) is 0,17 , which means there is no read supporting the reference at this site, but I just assembled the reference genome with those reads. However, the phred-scaled genotype likelihoods (PL) is 607, 0 (haploid genome) , which looks like agree with the reference. I am really confused now. Is there anyone can help?
Thanks
ok, thanks, PL makes sense now, but how about AD? It should be reads count before filtering. In this instance, I have a G in the reference but get 0 count of G in the reads, but the reference genome is assembled using these reads. Is there something wrong with my assembly step or mapping step?
That does look like a conflict between your de novo assembly and your SNP calling step.