Question

LiftOver CNA data from Hg19 to Hg38

0

Entering edit mode

4.9 years ago

afollette • 0

I have several vcf format type files that contain copy number alteration data from several sources that are in Hg19 coordinates. I have a pipeline that utilizes a LiftOver tool and the UCSC hg19 to hg38 chain file. The problem is that one of the CNA files is generated from 'shallow whole genome sequencing'. Therefore, if I try to liftOver the coordinates with high minimum ratio of bases that must remap then the LiftOver tools drops a large number of genomic positions because they do not 'lift'. While if I lower the minimum ratio of bases that remap low enough then all of the points 'lift'. However, I compared the UCSC liftOver tool and the NCBI remapping tool for these CNA data points and they 'LiftOver' to very different coordinates.

Here is an example set:

Original                    UCSC_LiftOver_results        NCBI_Remapper_results

chr7:100949555-100964196    chr7:100547187-100611118    chr7:101306274-101320915
chr7:100972001-101018949    chr7:100612904-100662230    chr7:101328720-101375668
chrX:1197001-1212723        chrX:1314890-1331616        chrX:1096848-1112570

So my question is there a better method for converting the long intervals of CNA data or WGS without a large amount of dropped intervals. Furthermore, what is with the discrepancy between the NCBI Remapper and UCSC? It seems that this has something to do with how they did the original pairwise alignment or how they are compensating for a split in the intervals. Or is this simply a difference between how the assemblies are maintained? Last, are there any ways to visualize the gaps in the Hg19/Hg38 to see how and why they do not always 'liftOver'?

LiftOver CNA NCBIremapper Hg19 Hg38 • 1.6k views

ADD COMMENT • link updated 4.9 years ago by GenoMax 152k • written 4.9 years ago by afollette • 0