Question: Translating Genomic Locations Between Species
1
gravatar for Click downvote
6.9 years ago by
Germany
Click downvote670 wrote:

Hi--

Below follows a question and below that my reason for asking the question. Perhaps reading the "reason" part is unnecessary.

I need to find a way to convert the genomic location chromosome X, position Y in the rat genome into a corresponding position in the mouse genome (if one is known to exist). How do I do that?


The reason I am asking is that I want to find out whether some pirna clusters are conserved between different species. As the pirna reads themselves are known not to be conserved I need to use a roundabout method: I will check whether a cluster in the rat genome has a cluster in the corresponding position in the mouse genome.

As input, I have two lists - one from the rat genome, one from the mouse genome- that show pirna clusters and their locations. Since the rat and mouse genomes have many insertions and deletions I can't merely check whether the cluster found at chromosome X, position Y in the rat genome has a corresponding cluster at chromosome X, position Y in the mouse genome. Hence, I need a way to translate genomic locations between these two species.

conservation alignment • 5.1k views
ADD COMMENTlink modified 5.1 years ago by biotinker0 • written 6.9 years ago by Click downvote670

Any updates? I'm trying to find a reliable tool fr cross-species mapping. Unfortunately I can;'t find a bench mark on the tools that are out there. Suggestions welcome.

ADD REPLYlink written 2.8 years ago by YaGalbi1.5k
6
gravatar for Ido Tamir
6.9 years ago by
Ido Tamir5.0k
Austria
Ido Tamir5.0k wrote:

There are a number of tools from the UCSC browser project that utilize liftover chain files.

liftover has a web interface, is easy to use and with the parameter -minMatch you can easly control how selective the coordinate mapping should be.

pslMap is another tool from the UCSC that is able to utilize chain files as input to transfer coordinates between species and within a genome assembly. It does not have a web interface and needs different input format from bed. I have no personal experience with this program.

pslMap - map PSLs alignments to new targets using alignments of
the old target to the new target.  Given inPsl and mapPsl, where
the target of inPsl is the query of mapPsl, create a new PSL
with the query of inPsl aligned to all the targets of mapPsl.
If inPsl is a protein to nucleotide alignment and mapPsl is a
nucleotide to nucleotide alignment, the resulting alignment is
nucleotide to nucleotide alignment of a hypothetical mRNA that
would code for the protein.  This is useful as it gives base
alignments of spliced codons.  A chain file may be used instead
mapPsl.

orthoMap (also from UCSC) also uses chain files to map between organisms.

Map items from one organism to another. Must
specify one type of item using the -itemFile or -itemTable
flags. OrthoMap simply maps over the genomic coordinates discarding
query inserts, mismatches, etc.

All use the same chain files with various levels of detail when mapping. I think most people use liftOver because thats the most visible one, and you stick to things that work even if they might be suboptimal (and because everybody does it).

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Ido Tamir5.0k
2

WARNING: liftOver was only designed to work between different assemblies of the same organism. It may not do what you want if you are lifting between different organisms. If there has been a rearrangement in one of the species, the size of the region being mapped may change dramatically after mapping. (from the liftover tool description)

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by Click downvote670

To nitpick: you were talking about positions in your question not big regions that can get split up. From experience with TFBS data (which are rather small regions) I get most of them mapped between mouse and human, and IIRC there is an explanation for the ones that don't get mapped (something like "split in destination" vs not existing in destination). I will add a different answer. But I see that current versions of liftOver even have -multiple for output. Maybe you could post your experience with liftOver vs pslMap vs hand crafted alignment usage?

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by Ido Tamir5.0k

I have checked it between different species and it indeed does not map to the right locations. For example if you take coordinates of a Gene Six1 from chicken and liftOver to mouse it maps to a completely unrelated location where Six1 is nowhere near so I will second the warning!

ADD REPLYlink written 5.9 years ago by Diana800

I just checked: chicken refseq six1 and six4

gets translated to: mouse six1 and six4

from a glancing at it this is exactly the syntenic region in mouse. It seems like the 1. exon of gallus six1 is either wrongly annotated in refseq (using the 1. exon of six4) or really much, much further away than in other species. The region encompasses six4 in chicken and also in mouse.

While its always good to be cautious when using tools, this is not an example of this tool being wrong.

ADD REPLYlink written 5.9 years ago by Ido Tamir5.0k
3
gravatar for Johanna Schott
6.9 years ago by
Germany
Johanna Schott390 wrote:

The UCSC genome browser has a tool that solves your question:

http://genome.ucsc.edu/cgi-bin/hgLiftOver

ADD COMMENTlink written 6.9 years ago by Johanna Schott390

I thought that liftOver was for converting coordinates between assemblies of the same species. Are you sure that it can be used to convert coordinates from one species to another?

ADD REPLYlink written 6.9 years ago by Giovanni M Dall'Olio26k
1

The tool allows you to do it; UCSC provides chain files for it. However, it seems that the conservation track might be better suited according to this reply: https://lists.soe.ucsc.edu/pipermail/genome/2009-August/019843.html

ADD REPLYlink written 6.9 years ago by Click downvote670
2
gravatar for Click downvote
6.9 years ago by
Germany
Click downvote670 wrote:

This page explains how to do it: http://genome.sph.umich.edu/wiki/LiftOver#Lift_genome_positions

Will update this reply later if I learn more.

I'm not certain the abstruse net / chain files are meant to be understood by humans. It seems that the liftover tool is meant to do the job for you.

More good info on setting up/using the tool: http://manuelcorpas.com/tag/liftover/

ADD COMMENTlink modified 6.9 years ago • written 6.9 years ago by Click downvote670
0
gravatar for biotinker
5.1 years ago by
biotinker0
United States
biotinker0 wrote:

I had to do the same thing as mentioned here for my research, and my solution was to use the .axt files provided by UCSC: https://genome.ucsc.edu/goldenPath/help/axt.html

They are derived from the .chain/.net files, and represent the best alignments of each position, while .chain/.net can contain several differently weighted alignments.

.axt files are available for download for many species. For example: http://hgdownload.cse.ucsc.edu/goldenPath/rn5/vsMm10/

ADD COMMENTlink written 5.1 years ago by biotinker0

Based on the blastz scores, how do you determine the conserved sequences, i.e what would be the conservation cutoff?

ADD REPLYlink written 4.5 years ago by Aishwarya Kulkarni70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1868 users visited in the last hour