Entering edit mode
5.8 years ago
JourneyToAbyss
▴
240
I have many lncRNA gene symbols, which I am trying to convert to Ensembl ID. Much of the difficulty stems from not knowing where these gene symbols come from. Thus, my efforts using bioDBnet
and biomaRt
in R have been unfruitful. Yet, if I use the ensembl website, it is able to identify these gene symbols.
Example:
RP11-23J18.1
RP4-616B8.4
CTD-3184A7.4
ILF3-AS1
RP11-410L14.2
Thus, I was hoping someone could point me in the correct direction to understand where/how these gene symbols originated and the best practice to convert them to ensembl IDs.
Thank you!
What code have you tried so far? I have also just tried to map these using biomaRt but could not do it. The genes that you list are predicted ncRNAs, some pseudogenes. They may not even be included in the databases accessed by biomaRt. They do have assigned Ensembl IDs, though. If you simply search for them in UCSC, for example, you can access information.
Their IDs are also accessible from GENCODE's 'comprehensive' annotation:
A logical question: do you even need these genes for your downstream work? Information on these types of genes is scant / non-existent; so, even if you convert them to Ensembl IDs, you'll likely have to later exclude them at your next step. Depends on what you want to do with them.
The following website might help https://genealacart.genecards.org/Query You need create an account. In addition directly going to genecards website and typing the RNA in query is also a way around if you have a very limited set of genes