Question: confusing vcf file
1
gravatar for chenyibi93
15 months ago by
chenyibi9310
chenyibi9310 wrote:

Hello,

I want to understand the heterozygosity of my sample after de novo assembly, so I mapped the reads used for assembly to the draft genome and called SNPs from that with GATK tools. However, some of the lines in the .vcf output are really weird. For example,

scf7180000017060        11256   .       G       T       577     PASS    AC=1;AF=1.00;AN=1;DP=17;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=44.07;QD=33.94;SOR=0.804 GT:AD:DP:GQ:PL  1:0,17:17:99:607,0

As you can see the allele depth (AD) is 0,17 , which means there is no read supporting the reference at this site, but I just assembled the reference genome with those reads. However, the phred-scaled genotype likelihoods (PL) is 607, 0 (haploid genome) , which looks like agree with the reference. I am really confused now. Is there anyone can help?

Thanks

assembly snp gatk • 291 views
ADD COMMENTlink modified 15 months ago by chrchang5237.7k • written 15 months ago by chenyibi9310
1
gravatar for chrchang523
15 months ago by
chrchang5237.7k
United States
chrchang5237.7k wrote:

High phred numbers are less likely. 607 corresponds to 10^{-60.7}.

ADD COMMENTlink written 15 months ago by chrchang5237.7k

ok, thanks, PL makes sense now, but how about AD? It should be reads count before filtering. In this instance, I have a G in the reference but get 0 count of G in the reads, but the reference genome is assembled using these reads. Is there something wrong with my assembly step or mapping step?

ADD REPLYlink written 15 months ago by chenyibi9310

That does look like a conflict between your de novo assembly and your SNP calling step.

ADD REPLYlink written 15 months ago by chrchang5237.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour
_