In a VCF created by HaplotypeCaller, with reads from two haploid samples, I have some entries in which one sample has a mutation but the other doesn't, where as expected I see a 1
for one sample and a 0
for the other sample, and indeed I do (last two columns only shown):
My understanding is that in the above example, the first sample has 3 reads agreeing with the reference and 0 alternate reads, while the second sample has 7 agreeing with the reference and 13 with the alternate allele (yes this looks like of heterozygote-ish and I said these were haploids but let's ignore that for now).
Now sometimes there are no reads in one of the samples and in these cases it appears that the genotype is encoded by .
instead of 0
or 1
, for example:
My understanding is the 0,0
there means no reads at all so a .
for unknown makes sense. All well and good so far.
But then I see some lines where the genotype is encoded as a .
but there are reads! For example:
What the heck is going on here? There are 150 reads supporting the reference allele, but it doesn't call a genotype? I don't get it.
have a look at thise position in IGV, look at the mapping qualities, look at the sam flags, look at the 'clipping' state i that region, etc...