Question

Ensembl gene ID conversion to gene name

1

Entering edit mode

8.5 years ago

Annika Forsingdal ▴ 250

Hi,

In our RNA-seq pipeline we have a step after mapping that converts ensembl gene ID's (e.g., ENSMUSG00000096126) to gene names. When the pipeline was built we mapped read files to older versions of the mouse genome.

What happens when new samples mapped on the newest version of the murine genome is run trough the pipeline? Will we just miss the gene names of the transcripts that were not included in the old built of the murine genome? Are there any additional consequences?

Thank you for your time,

Annika

RNA-Seq reference-genome • 3.9k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by Annika Forsingdal ▴ 250

1

Entering edit mode

I am not sure I understand the problem here. If you have transcript or gene IDs for a given version then use this version to retrieve the gene names, e.g. if you map your reads to e.g. Ensembl v82 then use the v82 BioMART or API to retrieve corresponding gene names. Now if you're trying to use transcript or gene IDs from an older version to find gene names in a newer one, you may indeed have problems such as the IDs not being present in the new version anymore but I don't think you should be doing this. If accurately identifying genes in a new version of Ensembl is critical you should probably remap all your data to that version.

ADD REPLY • link 8.5 years ago by Jean-Karim Heriche 27k

Ram · Answer 1 · 2015-11-04

You should always make sure that the versions of genome built/assembly and annotation are consistent. I don't exactly understand which part of your pipeline is not updated or why, but I suggest to either update everything or nothing. For a comparative analysis I would re-map everything against the latest assembly and gene-models.

Also I wouldn't use gene names, with which you mean gene symbols I guess, except for additional final annotation. Ensembl gene IDs don't change (mostly, except that they could become deleted, or added) and are unique while gene names are ambiguous and may change.

Ram · Answer 2 · 2015-11-06

1

Entering edit mode

8.5 years ago

Abdullah ▴ 100

Have a look at http://mygene.info/. It is an efficient way to make the conversion between Gene ID formats. It can be implemented inside a pipeline as well.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Abdullah ▴ 100