Tool: Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)
95
gravatar for Malachi Griffith
4.7 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith16k wrote:

Some recent posts reminded me that it might be useful for us to review the options for converting between genome coordinate systems.

This comes up in several contexts. Probably the most common is that you have some coordinates for a particular version of a reference genome and you want to determine the corresponding coordinates on a different version of the reference genome for that species. For example, you have a bed file with exon coordinates for human build GRC37 (hg19) and wish to update to GRCh38. By the way, for a nice summary of genome versions and their release names refer to the Assembly Releases and Versions FAQ

Or perhaps you have coordinates of a gene and wish to determine the corresponding coordinates in another species. For example, you have coordinates of a gene in human GRCh38 and wish to determine corresponding coordinates in mouse mm10.

Finally you may wish to convert coordinates between coordinate systems within a single assembly. For example, you have the coordinates of a series of exons and you want to determine the position of these exons with respect to the transcript, gene, contig, or entire chromosome.

There are now several well known tools that can help you with these kinds of tasks:

  1. UCSC liftOver. This tool is available through a simple web interface or it can be downloaded as a standalone executable. To use the executable you will also need to download the appropriate chain file. Each chain file describes conversions between a pair of genome assemblies. Liftover can be used through Galaxy as well. There is a python implementation of liftover called pyliftover that does conversion of point coordinates only.

  2. NCBI Remap. This tool is conceptually similar to liftOver in that in manages conversions between a pair of genome assemblies but it uses different methods to achieve these mappings. It is also available through a simple web interface or you can use the API for NCBI Remap.

  3. The Ensembl API. The final example I described above (converting between coordinate systems within a single genome assembly) can be accomplished with the Ensembl core API. Many examples are provided within the installation, overview, tutorial and documentation sections of the Ensembl API project. In particular, refer to these sections of the tutorial: 'Coordinates', 'Coordinate systems', 'Transform', and 'Transfer'.

  4. Assembly Converter. Ensembl also offers their own simple web interface for coordinate conversions called the Assembly Converter.

  5. Bioconductor rtracklayer package. For R users, Bioconductor has an implementation of UCSC liftOver in the rtracklayer package. To see documentation on how to use it, open an R session and run the following commands.

    source("http://bioconductor.org/biocLite.R")
    biocLite("rtracklayer")
    library(rtracklayer)
    ?liftOver

  6. CrossMap. A standalone open source program for convenient conversion of genome coordinates (or annotation files) between different assemblies. It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF. CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome. Not recommended for converting genome coordinates between species.

  7. Flo. A liftover pipeline for different reference genome builds of the same species. It describes the process as follows: "align the new assembly with the old one, process the alignment data to define how a coordinate or coordinate range on the old assembly should be transformed to the new assembly, transform the coordinates."

ADD COMMENTlink modified 9 hours ago by mgdias.jose0 • written 4.7 years ago by Malachi Griffith16k
4

By default i use liftOver and haven't really ever considered using the other offering, so thanks for the summary. I wonder, at least for the common genomes like hg19 or mm9, whether anyone has tested to see whether any of the tools outperform the others. I know UCSC uses "chains", but presumably the other methods differ.

ADD REPLYlink written 4.7 years ago by Ian5.0k
4

CrossMap is a program for convenient conversion of genome coordinates between assemblies. It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.

http://crossmap.sourceforge.net/

ADD REPLYlink written 4.0 years ago by wangliguo7860

neat tool, should add this to my toolbelt

ADD REPLYlink written 4.0 years ago by Istvan Albert ♦♦ 74k
3

Thanks for summarizing these. We need to start linking to this post when this question pops up again.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Istvan Albert ♦♦ 74k
2

I'll just add that for R/Bioc users, the rtracklayer has an implementation of liftOver, but it is native to R, so the UCSC liftOver tool is not needed directly. The Bioc version is said to be faster than the UCSC version, but I have not tested this myself.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Sean Davis23k
2

Thanks. I have now added a brief intro to this in the original post.

ADD REPLYlink written 4.7 years ago by Malachi Griffith16k

Thanks for informative post

ADD REPLYlink modified 17 months ago • written 2.6 years ago by morteza.mahmoudisaber30

rtracklayer's description seems incomplete.

ADD REPLYlink written 21 months ago by Anurag Priyam40

With the release of hg38 need to revisit contents of this post!! :)

ADD REPLYlink written 20 months ago by manojkumar_bhosale60

I have updated the example, but it should be noted that the majority of these tools generically support all major builds for multiple species (not just human). When a new build comes out, the "chain" files that explain how to convert to/from that build are usually released soon after.

ADD REPLYlink written 10 weeks ago by Malachi Griffith16k

Thanks for the informative post, I wanna convert SNPs file for maize from V2 to V3. How can I create the chain file to perform this conversion?

ADD REPLYlink written 19 months ago by Medhat7.1k

Any updates? I'm trying to find a reliable tool fr cross-species mapping. Unfortunately I can;'t find a bench mark on the tools that are out there. Suggestions welcome.

ADD REPLYlink written 5 months ago by kennethcondon2007930
13
gravatar for Giovanni M Dall'Olio
3.9 years ago by
London, UK
Giovanni M Dall'Olio25k wrote:

This new tool seems to be interesting: Crossmap. It allows to convert many formats, like SAM and wiggle.

ADD COMMENTlink written 3.9 years ago by Giovanni M Dall'Olio25k
3

In fact Ensembl's tool uses CrossMap http://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter?db=core

ADD REPLYlink written 2.2 years ago by Eli Korvigo90
0
gravatar for devikaparvathy
4 weeks ago by
India
devikaparvathy10 wrote:

Can I use this liftover to map co-ordinates between bacterial subspecies? My aim is to do an integrative analysis of certain public RNA-seq data available for a particular bacterial species S. aureus. But each experiment are done in different strains/subspecies.

What I plan to do is to align the reads to their respective reference genomes, and for further analysis, create an annotation file (GFF/GTF) - based on one of the selected subspecies (chosen "target" for lift over) and combine it with the mapped annotation of other subspecies ("source" for lift over).

Is this procedure right? Or are there any other alternatives? I do not wish to do all RNA-seq analysis separately and then simply compare the results of differential expressed gene lists.

ADD COMMENTlink written 4 weeks ago by devikaparvathy10
0
gravatar for mgdias.jose
9 hours ago by
mgdias.jose0 wrote:

Hello, thankyou for the amazing work compiling all these different tools.

For what I got, most of this tools work well between different assembly versions (let's say 37 vs 38), BUT if I want to compare data within the same assembly but different releases do you have any suggestion of the best approach? I need to compare data from mouse assembly 38, ensembl release 73 with the latest release - Ensembl 90. I have the transposon data with the coordinates of the hit and information about the genome region it hit (gene X/intergenic) Thank you !

ADD COMMENTlink written 9 hours ago by mgdias.jose0

converting coordinates across assemblies is the easy bit. That is the basic function of most of these tools.

ADD REPLYlink written 2 hours ago by kennethcondon2007930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour