Resources for converting between UCSC <-> Gencode <-> Ensembl chromosome names
3
15
Entering edit mode
7.7 years ago

We're currently using a variety of different versions of a variety of different organism reference genomes and are often running into the need to convert between chromosome coordinate naming systems (e.g., when someone wants data aligned against the hg19 reference from ensembl and for a gencode GTF file to be used). This is often as simple as a quick add/remove of "chr", but not always (e.g., who would know that JH806595.1 in gencode is HG1441_PATCH in ensembl?). So, does anyone know of a nice resource somewhere that provides the mappings?

At the end of the day, I just need a tab separated file with the name mappings. I've already written a little python script to perform all of the conversion (a trivial task), but making the mapping files is proving to be a PITA and I assume someone else has already done this.

Edit: BTW, if I have to make the mapping files myself I'll put them on github. It's absurd for that to ever need to be repeated by anyone.

Edit: For what it's worth, at least some of the hg19 gencode<->ensembl mappings are here.

UCSC Gencode Ensembl • 9.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

It wouldn't be so useful in this case, since I need this for GTF/BED/etc. files. Though it's good to know about that tool!

ADD REPLY
9
Entering edit mode
7.7 years ago

Should someone ever need this sort of thing in the future, I've started a github repository with a few conversions here. Everyone is encouraged to add additional conversions or fix any errors they see in those already there. Just submit a pull request.

I'll likely add more of these over time as we actually need them (I still need to add some for the fruit fly genome).

ADD COMMENT
0
Entering edit mode

Hey,

wherefrom do you have all these informations for example for GRCh38_ensembl2UCSC.txt. Its very useful Why it exsits GRCm38_UCSC2ensembl.txt and GRCm38_ensembl2UCSC.txt with different content its is not bijective (1:1) ?

ADD REPLY
1
Entering edit mode

The original information comes from the genome assemblies deposited in NCBI. There are multiple names for each contig therein. The trick is simply to figure out who uses which column (sometimes they like to modify them further).

ADD REPLY
2
Entering edit mode
7.7 years ago
Emily 23k

Ensembl=Gencode

ADD COMMENT
1
Entering edit mode

That's unfortunately not the case.

ADD REPLY
1
Entering edit mode

The naming might be different but the data in the GTFs are the same. The Ensembl geneset is the Gencode geneset.

ADD REPLY
0
Entering edit mode

True, unfortunately some users complain if things aren't processed exactly as requested, even if just using Ensembl (my preferred solution!) produces the same results.

ADD REPLY
0
Entering edit mode

Hi Emily, I understand that but why there is a difference in gene counts between Gencode and Ensembl, can you please have a look at this question I recently posted.

ADD REPLY
0
Entering edit mode
6.6 years ago
CAnna ▴ 20

Hi,

I went to your github repository to access these conversion tables, thank you this is very useful.

I am very new at bioinformatics and I am currently trying to convert the ENSEMBL chromosomes names of an annotation gtf file to UCSC chromosomes names, in order to index them with STAR (the STAR manual specify that the chromosome names of the fasta file and the gtf file should be the same)

But then, the only thing I have to do is to replace the names in the gtf annotation file, by their UCSC equivalent, and then I can run my indexing?

It's a trivial question but I am really new to all of this, Thank you very much,

Camille

ADD COMMENT
0
Entering edit mode

You could save yourself time and download the sequence/annotation/index bundles (though you would need to create your own STAR indexes) from iGenomes site.

ADD REPLY
0
Entering edit mode

Download the fasta file from Ensembl instead and save yourself the hassle. You can get the whole bundle from iGenomes, but that'll be a larger download.

ADD REPLY
0
Entering edit mode

Ok thank you for you advice!

ADD REPLY

Login before adding your answer.

Traffic: 2103 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6