How to convert UCSC transcript(isforms) ids to Ensemble transcript ids?
Entering edit mode
8.2 years ago
jack ▴ 980

Hi all,

I have UCSC isforms ids (~73000) and I want to convert them to Ensemble transcript IDs (ENST000....) I've tried to use UCSC or Biomart, but I was not successful. I think these ids belongs to genome assembly h19. Would someone help me with that? Here is some of UCSC isform IDs:

"531"   "uc002ldw.3"
"532"   "uc002ldx.3"
"533"   "uc002hnk.2"
"534"   "uc002hnl.2"
"535"   "uc002hnm.2"
"536"   "uc002hnn.2"
"537"   "uc002hno.2"
"538"   "uc002hnp.1"
"539"   "uc002hnq.2"
"540"   "uc002hnr.1"
"541"   "uc010cuy.2"
"542"   "uc010cuz.2"
"543"   "uc010wdb.1"
"544"   "uc010wdc.1"
"545"   "uc001tob.2"
"546"   "uc001toc.2"
"547"   "uc001tod.2"
"548"   "uc010sxl.1"
"549"   "uc010sxm.1"
"550"   "uc001tso.3"
"551"   "uc001tsp.2"
"552"   "uc001tsq.2"
"553"   "uc001tsr.2"
"554"   "uc001tss.1"
"555"   "uc009zvw.2"
"556"   "uc009zvx.2"
"557"   "uc003eov.3"
"558"   "uc003eoy.2"
"559"   "uc011blr.1"
"560"   "uc001qhk.2"
"561"   "uc001qhl.2"
"562"   "uc009zdc.2"
"563"   "uc009zde.1"
"564"   "uc010sco.1"
"565"   "uc010scp.1"
"566"   "uc010scq.1"
"567"   "uc010scr.1"
"568"   "uc003ela.3"
"569"   "uc003elb.2"
"570"   "uc003elc.1"
"571"   "uc003eld.1"
"572"   "uc003ele.2"
"573"   "uc010hsw.1"
"574"   "uc011bks.1"
"575"   "uc002vdz.3"
"576"   "uc010zjg.1"
"577"   "uc001dgw.3"
"578"   "uc001dgx.3"
"579"   "uc009wbp.2"
"580"   "uc009wbr.2"
"581"   "uc009wbs.1"
"582"   "uc010orc.1"
"583"   "uc010ord.1"
"584"   "uc010ore.1"
"585"   "uc010orf.1"
"586"   "uc010org.1"
"587"   "uc001lhb.2"
"588"   "uc010qub.1"
"589"   "uc001tza.3"
"590"   "uc001tzb.3"
"591"   "uc010szl.1"
"592"   "uc002gev.2"
"593"   "uc002gew.2"
RNA-Seq Assembly genomics Ensembl USCS • 6.0k views
Entering edit mode
8.2 years ago
ivivek_ngs ★ 5.2k

Do you think or they are from humans or whether they are hg19 or hg18 or even the latest one? You have to be a bit sure from which assembly they are from else it is not a correct way to proceed. In any case you can download all the gene list from UCSC browser. Take a look at this link or the wonderful mysql one liner as mentioned in the link. and download all of them and then you can just use merge in R to map your isoforms to corresponding transcript ids

Entering edit mode

These are the human transcriptome Ids

Entering edit mode

Then do you know to which assembly? hg18 or hg19 or Grch38? If you know then the above link which I have given shows different examples to download the entire refseq ids with transcript and other ids and then you can just merge your gene transcript ids from the downloaded file with R and retrieve all the valuable informations.

Entering edit mode

It's hg19, but seems that doesn't work

Entering edit mode

Please be specific about mentioning what does not work and what are you doing that the result is not as expected. Did you download the tab-delimeted file from UCSC from the browser or even with the mysql command and then run R merge on both the files? Can you show the output results from the download and what command you used for merging? Else it will be difficult to debug for us. Both the files format has to be same or else put them in vectors in R and then match for columns.

Entering edit mode

I have a similar problem. I downloaded level 3 isoform level data from TCGa and they have UCSC IDs there. I tried Biomart, DAVID, and also some other conversion tools online. The problem is most of the IDs do not match to any Ensembl ID. I tried to look the other way round online too (i.e my transcript of interest and its corresponding UCSC ID but cound not find it ). Am I missing something here?

Entering edit mode

BioMart can convert UCSC IDs to ENS IDs. 'uc010ajn.1' gets converted to ENSG00000211814 (gene) and ENST00000390462 (transcript). However the IDs from you list does not seem to work. These same IDs cannot be found in UCSC either when using their table browser, whereas 'uc010ajn.1' can.

Entering edit mode

So the input source is not correct then, the OP needs to know what is the proper ID , the online BioMart should also be able to do the same thing as you suggested but I was just giving a hang of trying browser and programmatically the above thing. Unless the OP gets the corrected ids then all the above suggestions will not work.

Note: Get to know where your source is from. Try to keep a doc of all sources that will be used as arguments for downstream work. Makes life easy to know what you are using and what you intend to do so even if something is broken people can give suggestions or debug it.


Login before adding your answer.

Traffic: 3191 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6