I'm very new to this space, but from what I've read I'm not sure I understand what the process of variant calling looks like -- if we're given a reference genome, and then a specific genome of interest, what makes finding variants, which are just differences from the reference, so hard to identify?
For example, at a specific location the sequence, why is it wrong to simply take the majority of whatever bp is there and say "at this position, we are __% confident that individual X has this bp"? And since you would expect see heterozygosity at many loci, what does it even mean if you see "60% A and 40% T" at a given location?
It seems like alignment would be the actual tricky part of the process, not the 'calling' part.
I think i'm probably making some faulty assumptions or do not fully understand the problem, and I'd appreciate any explanation.