Question: How to convert UCSC transcript(isforms) ids to Ensemble transcript ids?
gravatar for jack
2.9 years ago by
jack750 wrote:

Hi all,

I have UCSC isforms ids (~73000) and I want to convert them to Ensemble transcript IDs (ENST000....) I've tried to use UCSC or Biomart, but I was not successful. I think these ids belongs to genome assembly h19. Would someone help me with that? Here is some of UCSC isform IDs:

"531"   "uc002ldw.3"
"532"   "uc002ldx.3"
"533"   "uc002hnk.2"
"534"   "uc002hnl.2"
"535"   "uc002hnm.2"
"536"   "uc002hnn.2"
"537"   "uc002hno.2"
"538"   "uc002hnp.1"
"539"   "uc002hnq.2"
"540"   "uc002hnr.1"
"541"   "uc010cuy.2"
"542"   "uc010cuz.2"
"543"   "uc010wdb.1"
"544"   "uc010wdc.1"
"545"   "uc001tob.2"
"546"   "uc001toc.2"
"547"   "uc001tod.2"
"548"   "uc010sxl.1"
"549"   "uc010sxm.1"
"550"   "uc001tso.3"
"551"   "uc001tsp.2"
"552"   "uc001tsq.2"
"553"   "uc001tsr.2"
"554"   "uc001tss.1"
"555"   "uc009zvw.2"
"556"   "uc009zvx.2"
"557"   "uc003eov.3"
"558"   "uc003eoy.2"
"559"   "uc011blr.1"
"560"   "uc001qhk.2"
"561"   "uc001qhl.2"
"562"   "uc009zdc.2"
"563"   "uc009zde.1"
"564"   "uc010sco.1"
"565"   "uc010scp.1"
"566"   "uc010scq.1"
"567"   "uc010scr.1"
"568"   "uc003ela.3"
"569"   "uc003elb.2"
"570"   "uc003elc.1"
"571"   "uc003eld.1"
"572"   "uc003ele.2"
"573"   "uc010hsw.1"
"574"   "uc011bks.1"
"575"   "uc002vdz.3"
"576"   "uc010zjg.1"
"577"   "uc001dgw.3"
"578"   "uc001dgx.3"
"579"   "uc009wbp.2"
"580"   "uc009wbr.2"
"581"   "uc009wbs.1"
"582"   "uc010orc.1"
"583"   "uc010ord.1"
"584"   "uc010ore.1"
"585"   "uc010orf.1"
"586"   "uc010org.1"
"587"   "uc001lhb.2"
"588"   "uc010qub.1"
"589"   "uc001tza.3"
"590"   "uc001tzb.3"
"591"   "uc010szl.1"
"592"   "uc002gev.2"
"593"   "uc002gew.2"
ADD COMMENTlink modified 2.9 years ago by Emily_Ensembl18k • written 2.9 years ago by jack750
gravatar for ivivek_ngs
2.9 years ago by
Seattle,WA, USA
ivivek_ngs4.8k wrote:

Do you think or they are from humans or whether they are hg19 or hg18 or even the latest one? You have to be a bit sure from which assembly they are from else it is not a correct way to proceed. In any case you can download all the gene list from UCSC browser. Take a look at this link or the wonderful mysql one liner as mentioned in the link. and download all of them and then you can just use merge in R to map your isoforms to corresponding transcript ids

ADD COMMENTlink written 2.9 years ago by ivivek_ngs4.8k

These are the human transcriptome Ids

ADD REPLYlink written 2.9 years ago by jack750

Then do you know to which assembly? hg18 or hg19 or Grch38? If you know then the above link which I have given shows different examples to download the entire refseq ids with transcript and other ids and then you can just merge your gene transcript ids from the downloaded file with R and retrieve all the valuable informations.

ADD REPLYlink written 2.9 years ago by ivivek_ngs4.8k

It's hg19, but seems that doesn't work

ADD REPLYlink written 2.9 years ago by jack750

Please be specific about mentioning what does not work and what are you doing that the result is not as expected. Did you download the tab-delimeted file from UCSC from the browser or even with the mysql command and then run R merge on both the files? Can you show the output results from the download and what command you used for merging? Else it will be difficult to debug for us. Both the files format has to be same or else put them in vectors in R and then match for columns.

ADD REPLYlink written 2.9 years ago by ivivek_ngs4.8k

I have a similar problem. I downloaded level 3 isoform level data from TCGa and they have UCSC IDs there. I tried Biomart, DAVID, and also some other conversion tools online. The problem is most of the IDs do not match to any Ensembl ID. I tried to look the other way round online too (i.e my transcript of interest and its corresponding UCSC ID but cound not find it ). Am I missing something here?

ADD REPLYlink written 2.1 years ago by snishtala0310

BioMart can convert UCSC IDs to ENS IDs. 'uc010ajn.1' gets converted to ENSG00000211814 (gene) and ENST00000390462 (transcript). However the IDs from you list does not seem to work. These same IDs cannot be found in UCSC either when using their table browser, whereas 'uc010ajn.1' can.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Denise - Open Targets4.9k

So the input source is not correct then, the OP needs to know what is the proper ID , the online BioMart should also be able to do the same thing as you suggested but I was just giving a hang of trying browser and programmatically the above thing. Unless the OP gets the corrected ids then all the above suggestions will not work.

Note: Get to know where your source is from. Try to keep a doc of all sources that will be used as arguments for downstream work. Makes life easy to know what you are using and what you intend to do so even if something is broken people can give suggestions or debug it.

ADD REPLYlink written 2.9 years ago by ivivek_ngs4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 758 users visited in the last hour