Translating Genomic Locations Between Species
4
1
Entering edit mode
9.3 years ago

Hi--

Below follows a question and below that my reason for asking the question. Perhaps reading the "reason" part is unnecessary.

I need to find a way to convert the genomic location chromosome X, position Y in the rat genome into a corresponding position in the mouse genome (if one is known to exist). How do I do that?

The reason I am asking is that I want to find out whether some pirna clusters are conserved between different species. As the pirna reads themselves are known not to be conserved I need to use a roundabout method: I will check whether a cluster in the rat genome has a cluster in the corresponding position in the mouse genome.

As input, I have two lists - one from the rat genome, one from the mouse genome- that show pirna clusters and their locations. Since the rat and mouse genomes have many insertions and deletions I can't merely check whether the cluster found at chromosome X, position Y in the rat genome has a corresponding cluster at chromosome X, position Y in the mouse genome. Hence, I need a way to translate genomic locations between these two species.

conservation alignment • 7.1k views
0
Entering edit mode

Any updates? I'm trying to find a reliable tool fr cross-species mapping. Unfortunately I can;'t find a bench mark on the tools that are out there. Suggestions welcome.

6
Entering edit mode
9.3 years ago
Ido Tamir 5.2k

There are a number of tools from the UCSC browser project that utilize liftover chain files.

liftover has a web interface, is easy to use and with the parameter -minMatch you can easly control how selective the coordinate mapping should be.

pslMap is another tool from the UCSC that is able to utilize chain files as input to transfer coordinates between species and within a genome assembly. It does not have a web interface and needs different input format from bed. I have no personal experience with this program.

pslMap - map PSLs alignments to new targets using alignments of
the old target to the new target.  Given inPsl and mapPsl, where
the target of inPsl is the query of mapPsl, create a new PSL
with the query of inPsl aligned to all the targets of mapPsl.
If inPsl is a protein to nucleotide alignment and mapPsl is a
nucleotide to nucleotide alignment, the resulting alignment is
nucleotide to nucleotide alignment of a hypothetical mRNA that
would code for the protein.  This is useful as it gives base
alignments of spliced codons.  A chain file may be used instead
mapPsl.


orthoMap (also from UCSC) also uses chain files to map between organisms.

Map items from one organism to another. Must
specify one type of item using the -itemFile or -itemTable
flags. OrthoMap simply maps over the genomic coordinates discarding
query inserts, mismatches, etc.


All use the same chain files with various levels of detail when mapping. I think most people use liftOver because thats the most visible one, and you stick to things that work even if they might be suboptimal (and because everybody does it).

2
Entering edit mode

WARNING: liftOver was only designed to work between different assemblies of the same organism. It may not do what you want if you are lifting between different organisms. If there has been a rearrangement in one of the species, the size of the region being mapped may change dramatically after mapping. (from the liftover tool description)

0
Entering edit mode

To nitpick: you were talking about positions in your question not big regions that can get split up. From experience with TFBS data (which are rather small regions) I get most of them mapped between mouse and human, and IIRC there is an explanation for the ones that don't get mapped (something like "split in destination" vs not existing in destination). I will add a different answer. But I see that current versions of liftOver even have -multiple for output. Maybe you could post your experience with liftOver vs pslMap vs hand crafted alignment usage?

0
Entering edit mode

I have checked it between different species and it indeed does not map to the right locations. For example if you take coordinates of a Gene Six1 from chicken and liftOver to mouse it maps to a completely unrelated location where Six1 is nowhere near so I will second the warning!

0
Entering edit mode

I just checked: chicken refseq six1 and six4

gets translated to: mouse six1 and six4

from a glancing at it this is exactly the syntenic region in mouse. It seems like the 1. exon of gallus six1 is either wrongly annotated in refseq (using the 1. exon of six4) or really much, much further away than in other species. The region encompasses six4 in chicken and also in mouse.

While its always good to be cautious when using tools, this is not an example of this tool being wrong.

3
Entering edit mode
9.3 years ago

The UCSC genome browser has a tool that solves your question:

http://genome.ucsc.edu/cgi-bin/hgLiftOver

0
Entering edit mode

I thought that liftOver was for converting coordinates between assemblies of the same species. Are you sure that it can be used to convert coordinates from one species to another?

1
Entering edit mode

The tool allows you to do it; UCSC provides chain files for it. However, it seems that the conservation track might be better suited according to this reply: https://lists.soe.ucsc.edu/pipermail/genome/2009-August/019843.html

2
Entering edit mode
9.3 years ago

I'm not certain the abstruse net / chain files are meant to be understood by humans. It seems that the liftover tool is meant to do the job for you.

More good info on setting up/using the tool: http://manuelcorpas.com/tag/liftover/

0
Entering edit mode
7.4 years ago
biotinker • 0

I had to do the same thing as mentioned here for my research, and my solution was to use the .axt files provided by UCSC: https://genome.ucsc.edu/goldenPath/help/axt.html

They are derived from the .chain/.net files, and represent the best alignments of each position, while .chain/.net can contain several differently weighted alignments.

0
Entering edit mode

Based on the blastz scores, how do you determine the conserved sequences, i.e what would be the conservation cutoff?