Question: How to convert current Ensembl Gene ID into older versions
0
gravatar for znh.bing.liu
17 months ago by
znh.bing.liu10
znh.bing.liu10 wrote:

Hi there, I am currently analyzing the singling pathway with lncRNA2Function. But some of the lncRNAs can not be recognized by the datebase. It showed this message "The following lncRNAs that you input could not be found in the Ensembl v70 (GENCODE v15) and were excluded from the enrichment analysis". My concern is that whether the Ensembl Gene ID is different between the old Ensembl version and the current version. If it is, how can I convert the current version into the old version? Thanks!

ensembl gene • 741 views
ADD COMMENTlink modified 17 months ago by Emily_Ensembl14k • written 17 months ago by znh.bing.liu10
1

You are comparing distantly related Ensembl datasets, from release 70 (January 2013) to current release 84 (March 2016). The Ensembl Gene ID can (will) be different between older and current versions of Ensembl. This is mandatory when there have been changes in the structure of the lincRNA gene model, even if those were minor changes (e.g. extending the first or last exon). There are a few scenarios here:

  • Gene structure that is exactly the same between v84 and v70. The ENSG ID will be the same. Conversion: not needed. Happy days.

  • More genes in v84 due to newer annotation in the latest assembly. This means these are absent from v70. Conversion: not possible. Oh well, that's life days.

  • Genes in v70, which annotation was not confirmed/supported in v84. These old genes have been deprecated (e.g. ENSG00000232274). Conversion: not needed. Worrying days. Should one trust this old annotation and a newer more updated version is available?

  • Gene present in v84 and v70 but with slightly different structures. They will have different ENSG IDs. Conversion: mandatory. Busy days.

Let's focus on point (4): in addition to the gene symbols suggested by @EagleEye, I'd suggest looking at gene names to catch those cases of your lincRNAs that do not have an HGNC symbol, rather a clone name (e.g. RP11-506F3). You can get both HGNC symbols and clone names in the Ensembl GTF. You could also try to convert the coordinates of the lincRNA genes from GRCh38 (v84) to GRCh37 (v70).

ADD REPLYlink modified 17 months ago • written 17 months ago by Denise - Open Targets4.1k
1
gravatar for EagleEye
17 months ago by
EagleEye4.8k
Sweden
EagleEye4.8k wrote:
  1. Have you tried with Gene Symbols ? It might work effectively with that. Also try by removing revision numbers in the end of the gene name. Example 'ENSG00000228630' instead of 'ENSG00000228630.2'.

  2. Since lncRNA2function study was based on Gencode v15, there is a possibility that it does not include the new lncRNAs. http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-16-S3-S2

The lncRNAs are increasing day by day and there is huge difference between v15 and current versions or the latest version based on hg19 (gencode v19). It means whatever RNAs lncRNA2function says does not match means, it was not there when they carried out the study. http://www.gencodegenes.org/releases/

Example: Number of lincRNA class in Gencode v15 is 6,458 and v19 is 7,114 ( other classes also have huge difference)

http://www.gencodegenes.org/stats/archive.html#a15

http://www.gencodegenes.org/stats/archive.html#a19

I could not also find any annotation update history from lncRNA2Function website. The predictions are still based on v15.

ADD COMMENTlink modified 17 months ago • written 17 months ago by EagleEye4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1431 users visited in the last hour