How to convert current Ensembl Gene ID into older versions
1
0
Entering edit mode
7.8 years ago
znh.bing.liu ▴ 10

Hi there, I am currently analyzing the singling pathway with lncRNA2Function. But some of the lncRNAs can not be recognized by the datebase. It showed this message "The following lncRNAs that you input could not be found in the Ensembl v70 (GENCODE v15) and were excluded from the enrichment analysis". My concern is that whether the Ensembl Gene ID is different between the old Ensembl version and the current version. If it is, how can I convert the current version into the old version? Thanks!

gene Ensembl • 4.1k views
ADD COMMENT
1
Entering edit mode

You are comparing distantly related Ensembl datasets, from release 70 (January 2013) to current release 84 (March 2016). The Ensembl Gene ID can (will) be different between older and current versions of Ensembl. This is mandatory when there have been changes in the structure of the lincRNA gene model, even if those were minor changes (e.g. extending the first or last exon). There are a few scenarios here:

  • Gene structure that is exactly the same between v84 and v70. The ENSG ID will be the same. Conversion: not needed. Happy days.

  • More genes in v84 due to newer annotation in the latest assembly. This means these are absent from v70. Conversion: not possible. Oh well, that's life days.

  • Genes in v70, which annotation was not confirmed/supported in v84. These old genes have been deprecated (e.g. ENSG00000232274). Conversion: not needed. Worrying days. Should one trust this old annotation and a newer more updated version is available?

  • Gene present in v84 and v70 but with slightly different structures. They will have different ENSG IDs. Conversion: mandatory. Busy days.

Let's focus on point (4): in addition to the gene symbols suggested by @EagleEye, I'd suggest looking at gene names to catch those cases of your lincRNAs that do not have an HGNC symbol, rather a clone name (e.g. RP11-506F3). You can get both HGNC symbols and clone names in the Ensembl GTF. You could also try to convert the coordinates of the lincRNA genes from GRCh38 (v84) to GRCh37 (v70).

ADD REPLY
1
Entering edit mode
7.8 years ago
EagleEye 7.5k
  1. Have you tried with Gene Symbols ? It might work effectively with that. Also try by removing revision numbers in the end of the gene name. Example 'ENSG00000228630' instead of 'ENSG00000228630.2'.

  2. Since lncRNA2function study was based on Gencode v15, there is a possibility that it does not include the new lncRNAs. http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-16-S3-S2

The lncRNAs are increasing day by day and there is huge difference between v15 and current versions or the latest version based on hg19 (gencode v19). It means whatever RNAs lncRNA2function says does not match means, it was not there when they carried out the study. http://www.gencodegenes.org/releases/

Example: Number of lincRNA class in Gencode v15 is 6,458 and v19 is 7,114 ( other classes also have huge difference)

http://www.gencodegenes.org/stats/archive.html#a15

http://www.gencodegenes.org/stats/archive.html#a19

I could not also find any annotation update history from lncRNA2Function website. The predictions are still based on v15.

ADD COMMENT

Login before adding your answer.

Traffic: 2193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6