Hello everyone,
I am updating our variant calling workflow from hg19-based to hg38-based. I notice a variant rs234701 (hg19 chr21:44476759 and hg38 chr21:43056649, A->G mutation) that was often called before disappears in the updated version. No difference is seen in the upstream and downstream 300bp is seen, hence I suspect the sequencing fragments are mapped somewhere else in the hg38 ref genome.
After some searching I gladly find the interval 21:6448027-6448627 has almost identical sequence vs. 21:43056349-43056949 (both under hg38), and the only difference is position 64480327 has a G meanwhile 43056649 has an A. So that's why all fragments originally supporting mutation rs234701 are now mapped to this area, supporting a wildtype genotype. Perfect solution. Certainly, it would be interesting to examine if hg19 has such a corresponding piece upstream. But when I try to convert from hg38 21:6448027-6448627 to hg19, both UCSC hgLiftOver and NCBI remapping redirect me again to chr21:44476459-44477059. Does this mean hg19 does not have such an upstream sequence?
Additional question: if two different long pieces in ref genome share highly similar sequences (in this case only 1bp difference by two 600-bp pieces), can we trust the sequence mapping result and the variant calling based on it?
Thank you!