Hi there, I am currently analyzing the singling pathway with lncRNA2Function. But some of the lncRNAs can not be recognized by the datebase. It showed this message "The following lncRNAs that you input could not be found in the Ensembl v70 (GENCODE v15) and were excluded from the enrichment analysis". My concern is that whether the Ensembl Gene ID is different between the old Ensembl version and the current version. If it is, how can I convert the current version into the old version? Thanks!
You are comparing distantly related Ensembl datasets, from release 70 (January 2013) to current release 84 (March 2016). The Ensembl Gene ID can (will) be different between older and current versions of Ensembl. This is mandatory when there have been changes in the structure of the lincRNA gene model, even if those were minor changes (e.g. extending the first or last exon). There are a few scenarios here:
Gene structure that is exactly the same between v84 and v70. The ENSG ID will be the same. Conversion: not needed. Happy days.
More genes in v84 due to newer annotation in the latest assembly. This means these are absent from v70. Conversion: not possible. Oh well, that's life days.
Genes in v70, which annotation was not confirmed/supported in v84. These old genes have been deprecated (e.g. ENSG00000232274). Conversion: not needed. Worrying days. Should one trust this old annotation and a newer more updated version is available?
Gene present in v84 and v70 but with slightly different structures. They will have different ENSG IDs. Conversion: mandatory. Busy days.
Let's focus on point (4): in addition to the gene symbols suggested by @EagleEye, I'd suggest looking at gene names to catch those cases of your lincRNAs that do not have an HGNC symbol, rather a clone name (e.g. RP11-506F3). You can get both HGNC symbols and clone names in the Ensembl GTF. You could also try to convert the coordinates of the lincRNA genes from GRCh38 (v84) to GRCh37 (v70).