Question: Best tools for lifting over genome coordinates for non model organism (custom) reference genomes
22 months ago by
William

What are currently the best tools for lifting over genome coordinates from one custom genome build to another custom genome build.

I found this 4 year old post that mentions UCSC liftOver as the top tool but that seems to be limited to model organisms as you need the chain file from UCSC . I can't find anywhere how to create these chain files your self: Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)

Is it possible to create these UCSC liftOver chain files by your self for custom genome builds?

Are there other tools with the same quality and functionality as UCSC liftOver that support creating these "chain" files yourself for custom reference genomes? How difficult and time / resource consuming is it to build these chain files?

Do these chain files and the lift over process take strand into account? If I have a VCF with an A/T SNP positions do they flip it to T/A SNP position if the new reference genome has a genome region flipped to the different DNA strand?

Does lift over support INDEL variants?

CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)). It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.

CrossMap first determines the correspondence between genome assemblies from UCSC chain file

Also for crossmap I need a UCSC chain file. How do I create these myself for two custom genome builds of the same species?

These protocols are for small genomes only. We really need a way to produce chains from AGP or a clear workflow to create the chain from fasta data of larger size (and comprising many contigs/scaffolds) which is not appropriate for using the links above without expertise I did not find.

Anyone having created chain files from FASTA for genome/assemblies > 1Gb?

