Question

How To Map Isoform Ids Of Transcript To Entrez Ids ?

2

Entering edit mode

10.2 years ago

jack ▴ 520

Hi,

I've got RNA-seq data from TCGA. I have gene expression level and also isoform expression level. I want to know how can I map the isoform ID of transcripts to Entreize gene ID.

my isoform IDs looks like as follow:

isoform_id    normalized_count
uc011lsn.1    0.0000
uc010unu.1    20.1848
uc010uoa.1    7.1561
uc002bgz.2    36.1698
uc002bic.2    0.0000
uc010zzl.1    188.5822
uc001jiu.2    1085.9445
uc010qhg.1

genomics tcga ngs • 7.5k views

ADD COMMENT • link updated 10.2 years ago by Neilfws 49k • written 10.2 years ago by jack ▴ 520

1

Entering edit mode

I would normally recommend either BioMart or the UCSC Table Browser for this task. But before we go any further: none of those isoform IDs appear to be valid? I found some corresponding Entrez IDs from this mailing list and those IDs are not valid either, having been replaced.

ADD REPLY • link 10.2 years ago by Neilfws 49k

score 2 · Answer 1 · 2014-02-16

Those are UCSC isoform ids. So either get a corresponding table from UCSC GB by selecting track=UCSC genes and table=knownToKeggEntrez, then use the table as a dictionary to remap. You can also paste the list to gene id conversion tool, such as DAVID.

I'm not sure if you strictly need mapping to Entrez Id, or just group isoforms by gene. In this case I recommend switching to RefSeq IDs and use RefSeq track to get gene names. The table for this conversion could be obtained by selecting track=RefSeq Genes and table=kgXref.