Question

How To Convert Uniprot Ids To Ensembl Gene/Transcript Ids

7

Entering edit mode

12.8 years ago

learnerforever ▴ 520

I have ~3000 UNIPROT IDs and some of them have "-", that I'm guessing are some sort of isoforms.

A0AVT1
A0FGR8-4
A0M8Q6
A0MZ66-2
A1L0T0
A1L4H1-2
A1X283
A2A2D0
A2A2Z9
A2ACR1

What's the most reliable way to convert these to corresponding ENSEMBL gene/transcript IDs? I used biomart but hyphens failed to resolve. So did bidbnet.

Thanks everyone!

uniprot • 15k views

ADD COMMENT • link updated 21 months ago by Ram 45k • written 12.8 years ago by learnerforever ▴ 520

Ram · Answer 1 · 2012-10-01

Pierre's answer is cool. Here is another one using a mysql query of UCSC:

mysql \
  --user=genomep \
  --password=password \
  --host=genome-mysql.cse.ucsc.edu \
  -A \
  -D hg19 \
  -e 'SELECT knownGene.name,knownGene.proteinID,ensGene.name,ensGene.name2
    FROM knownGene,knownToEnsembl,ensGene
    WHERE knownGene.name=knownToEnsembl.name AND knownToEnsembl.value=ensGene.name
      AND knownGene.proteinID REGEXP "A0AVT1|A0FGR8|A0M8Q6|A0MZ66|A1L0T0|A1L4H1|A1X283|A2A2D0|A2A2Z9|A2ACR1"'

Result:

+------------+-----------+-----------------+-----------------+
| name       | proteinID | name            | name2           |
| uc002qlg.4 | A1L4H1    | ENST00000389623 | ENSG00000179954 |
| uc002nam.3 | A1L0T0    | ENST00000263383 | ENSG00000105135 |
| uc021pzk.1 | A0MZ66-3  | ENST00000392903 | ENSG00000187164 |
| uc001lcy.4 | A0MZ66-6  | ENST00000497044 | ENSG00000187164 |
| uc009xyw.3 | A0MZ66    | ENST00000355371 | ENSG00000187164 |
| uc001lcz.4 | A0MZ66-2  | ENST00000355371 | ENSG00000187164 |
| uc010qso.2 | A0MZ66    | ENST00000355371 | ENSG00000187164 |
| uc010qsp.1 | A0MZ66    | ENST00000392903 | ENSG00000187164 |
| uc010qsq.1 | A0MZ66    | ENST00000392901 | ENSG00000187164 |
| uc003woa.1 | A0FGR8-2  | ENST00000421679 | ENSG00000117868 |
| uc003wob.1 | A0FGR8-2  | ENST00000251527 | ENSG00000117868 |
| uc003woc.1 | A0FGR8    | ENST00000275418 | ENSG00000117868 |
| uc003wod.1 | A0FGR8-2  | ENST00000429474 | ENSG00000117868 |
| uc003mbr.3 | A1X283    | ENST00000311601 | ENSG00000174705 |
| uc003hdg.4 | A0AVT1    | ENST00000322244 | ENSG00000033178 |
| uc003hdi.3 | A0AVT1-3  | ENST00000420827 | ENSG00000033178 |
| uc003hdj.2 | A0AVT1-4  | ENST00000429659 | ENSG00000033178 |
+------------+-----------+-----------------+-----------------+

Ram · Answer 2 · 2012-10-02

5

Entering edit mode

12.8 years ago

Jerven ▴ 660

You should use the mapping service at uniprot.org. http://www.uniprot.org/mapping/

Select UniProt ac in the from column. Select Ensembl Transcript in the second column. Upload a file of UniProt identifiers or a list. Then you get a list you can download. Help at http://www.uniprot.org/help/mapping

UniProt does not maintain a list of Ensembl transcript id's to UniProt isoform ids. For that the UCSC answer would be better, but do pay attention to time lag leading to out of date data.

ADD COMMENT • link 12.8 years ago by Jerven ▴ 660

0

Entering edit mode

Now, you can enter f.ex. Q96EY1-1 and Uniprot mapping will return the mapped transcript only. Q96EY1 will still return all transcripts of the gene

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 10.6 years ago by Michi ▴ 990

Ram · Answer 3 · 2012-10-01

You could use the following xslt stylesheet:

<?xml version='1.0'  encoding="ISO-8859-1"?>
<xsl:stylesheet
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
    xmlns:u="http://uniprot.org/uniprot"
    version='1.1'
    >
<xsl:param name="query"></xsl:param>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="/u:uniprot/u:entry/u:dbReference[@type='Ensembl']">
<xsl:value-of select="$query"/>
<xsl:text> </xsl:text>
<xsl:value-of select="../u:accession"/>
<xsl:text> </xsl:text>
<xsl:value-of select="@id"/>
<xsl:text> </xsl:text>
<xsl:value-of select="u:property[@type='protein sequence ID']/@value"/>
<xsl:text> </xsl:text>
<xsl:value-of select="u:property[@type='gene ID']/@value"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

example:

$ for A in A0AVT1 A0FGR8-4 A0M8Q6 A0MZ66-2 A1L0T0 A1L4H1-2 A1X283 A2A2D0 A2A2Z9 A2ACR1; do xsltproc --novalid --stringparam query $A stylesheet.xsl "http://www.uniprot.org/uniprot/${A}.xml" ; done
A0AVT1 A0AVT1 ENST00000322244 ENSP00000313454 ENSG00000033178
A0AVT1 A0AVT1 ENST00000420827 ENSP00000399234 ENSG00000033178
A0FGR8-4 A0FGR8 ENST00000251527 ENSP00000251527 ENSG00000117868
A0MZ66-2 A0MZ66 ENST00000260777 ENSP00000260777 ENSG00000187164
A0MZ66-2 A0MZ66 ENST00000355371 ENSP00000347532 ENSG00000187164
A0MZ66-2 A0MZ66 ENST00000392903 ENSP00000376636 ENSG00000187164
A1L0T0 A1L0T0 ENST00000263383 ENSP00000263383 ENSG00000105135
A1L4H1-2 A1L4H1 ENST00000389623 ENSP00000374274 ENSG00000179954
A1X283 A1X283 ENST00000311601 ENSP00000309714 ENSG00000174705
A2A2D0 A2A2D0 ENST00000446334 ENSP00000407567 ENSG00000117632
A2A2Z9 A2A2Z9 ENST00000290943 ENSP00000290943 ENSG00000230453
A2ACR1 A2ACR1 ENST00000395330 ENSP00000378739 ENSG00000240065
A2ACR1 A2ACR1 ENST00000399371 ENSP00000382305 ENSG00000240118
A2ACR1 A2ACR1 ENST00000399607 ENSP00000382516 ENSG00000243594
A2ACR1 A2ACR1 ENST00000420182 ENSP00000404847 ENSG00000239836
A2ACR1 A2ACR1 ENST00000451862 ENSP00000401221 ENSG00000243067