Refseq IDs conversion to gene symbol
0
0
Entering edit mode
7 weeks ago
rgrindle • 0

Hi All,

I am creating a pipeline for determining orthologs from given transcriptomes. One of the tools my pipeline leverages outputs Refseq predicted protein ids for each sequence that look like:

61622.XP_010357577.1, 
8479.XP_005311777.2, 
61622.XP_010357577.1, 
10036.XP_005068815.1,

I now find myself stumped on how to convert these id's to gene names. I understand Biomart has the ability to filter based on Refseq IDs however that would require that I obtain the correct Mart object for the given Refseq species, which does not directly translate to ensembl datasets. Ex : (8479.XP_005311777.2 : Species = Emydinae, ID = XP_005311777.2 .......... no Emydinae dataset in ensembl). Does anyone know a way that I might be able to convert these id's to something a little more helpful (ensembl id's would even work as an intermediate).

Ensembl Orthology Symbol Refseq Gene • 221 views
ADD COMMENT
0
Entering edit mode

Not sure where you ended up with these ID's but if we were to ignore the numerical part then using EntrezDirect you can get entrezID. Since these are predicted proteins there is likely not linked gene name/symbol and may not work in all cases (it seems to work with only one from your list.

$ esearch -db protein -query XP_005311777 | elink -target gene | esummary | xtract -pattern DocumentSummary -element Id,Name,ScientificName
101949809   RHBDD3  Chrysemys picta
ADD REPLY

Login before adding your answer.

Traffic: 3042 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6