I have several vcf format type files that contain copy number alteration data from several sources that are in Hg19 coordinates. I have a pipeline that utilizes a LiftOver tool and the UCSC hg19 to hg38 chain file. The problem is that one of the CNA files is generated from 'shallow whole genome sequencing'. Therefore, if I try to liftOver the coordinates with high minimum ratio of bases that must remap then the LiftOver tools drops a large number of genomic positions because they do not 'lift'. While if I lower the minimum ratio of bases that remap low enough then all of the points 'lift'. However, I compared the UCSC liftOver tool and the NCBI remapping tool for these CNA data points and they 'LiftOver' to very different coordinates.
Here is an example set:
Original UCSC_LiftOver_results NCBI_Remapper_results chr7:100949555-100964196 chr7:100547187-100611118 chr7:101306274-101320915 chr7:100972001-101018949 chr7:100612904-100662230 chr7:101328720-101375668 chrX:1197001-1212723 chrX:1314890-1331616 chrX:1096848-1112570
So my question is there a better method for converting the long intervals of CNA data or WGS without a large amount of dropped intervals. Furthermore, what is with the discrepancy between the NCBI Remapper and UCSC? It seems that this has something to do with how they did the original pairwise alignment or how they are compensating for a split in the intervals. Or is this simply a difference between how the assemblies are maintained? Last, are there any ways to visualize the gaps in the Hg19/Hg38 to see how and why they do not always 'liftOver'?