Question: How To Convert Uniprot Ids To Ensemble Gene/Transcript Ids
7
gravatar for learnerforever
7.0 years ago by
learnerforever520 wrote:

I have ~3000 UNIPROT IDs and some of them have "-", that I'm guessing are some sort of isoforms.

A0AVT1
A0FGR8-4
A0M8Q6
A0MZ66-2
A1L0T0
A1L4H1-2
A1X283
A2A2D0
A2A2Z9
A2ACR1

What's the most reliable way to convert these to corresponding ENSEMBL gene/transcript IDs? I used biomart but hyphens failed to resolve. So did bidbnet.

Thanks everyone!

uniprot • 8.7k views
ADD COMMENTlink modified 7.0 years ago by Jerven640 • written 7.0 years ago by learnerforever520
8
gravatar for Malachi Griffith
7.0 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Pierre's answer is cool. Here is another one using a mysql query of UCSC:

mysql --user=genomep  --password=password --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'SELECT knownGene.name,knownGene.proteinID,ensGene.name,ensGene.name2 from knownGene,knownToEnsembl,ensGene WHERE knownGene.name=knownToEnsembl.name AND knownToEnsembl.value=ensGene.name AND knownGene.proteinID REGEXP "A0AVT1|A0FGR8|A0M8Q6|A0MZ66|A1L0T0|A1L4H1|A1X283|A2A2D0|A2A2Z9|A2ACR1"'

Result:

+------------+-----------+-----------------+-----------------+
| name       | proteinID | name            | name2           |
| uc002qlg.4 | A1L4H1    | ENST00000389623 | ENSG00000179954 |
| uc002nam.3 | A1L0T0    | ENST00000263383 | ENSG00000105135 |
| uc021pzk.1 | A0MZ66-3  | ENST00000392903 | ENSG00000187164 |
| uc001lcy.4 | A0MZ66-6  | ENST00000497044 | ENSG00000187164 |
| uc009xyw.3 | A0MZ66    | ENST00000355371 | ENSG00000187164 |
| uc001lcz.4 | A0MZ66-2  | ENST00000355371 | ENSG00000187164 |
| uc010qso.2 | A0MZ66    | ENST00000355371 | ENSG00000187164 |
| uc010qsp.1 | A0MZ66    | ENST00000392903 | ENSG00000187164 |
| uc010qsq.1 | A0MZ66    | ENST00000392901 | ENSG00000187164 |
| uc003woa.1 | A0FGR8-2  | ENST00000421679 | ENSG00000117868 |
| uc003wob.1 | A0FGR8-2  | ENST00000251527 | ENSG00000117868 |
| uc003woc.1 | A0FGR8    | ENST00000275418 | ENSG00000117868 |
| uc003wod.1 | A0FGR8-2  | ENST00000429474 | ENSG00000117868 |
| uc003mbr.3 | A1X283    | ENST00000311601 | ENSG00000174705 |
| uc003hdg.4 | A0AVT1    | ENST00000322244 | ENSG00000033178 |
| uc003hdi.3 | A0AVT1-3  | ENST00000420827 | ENSG00000033178 |
| uc003hdj.2 | A0AVT1-4  | ENST00000429659 | ENSG00000033178 |
+------------+-----------+-----------------+-----------------+
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Malachi Griffith17k

Could you give an example of Ensembl transripts to UniProt ids translation please?

ADD REPLYlink written 2.5 years ago by Fill70
5
gravatar for Jerven
7.0 years ago by
Jerven640
Jerven640 wrote:

You should use the mapping service at uniprot.org. http://www.uniprot.org/mapping/

Select UniProt ac in the from column. Select Ensembl Transcript in the second column. Upload a file of UniProt identifiers or a list. Then you get a list you can download. Help at http://www.uniprot.org/help/mapping

UniProt does not maintain a list of Ensembl transcript id's to UniProt isoform ids. For that the UCSC answer would be better, but do pay attention to time lag leading to out of date data.

ADD COMMENTlink written 7.0 years ago by Jerven640

Now, you can enter f.ex. Q96EY1-1 and Uniprot mapping will return the mapped transcript only. Q96EY1 will still return all transcripts of the gene

ADD REPLYlink written 4.8 years ago by Michi950
3
gravatar for Pierre Lindenbaum
7.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

you could use the following xslt stylesheet:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    xmlns:u="http://uniprot.org/uniprot"
    version='1.1'
    >
<xsl:param name="query"></xsl:param>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="/u:uniprot/u:entry/u:dbReference[@type='Ensembl']">
<xsl:value-of select="$query"/>
<xsl:text> </xsl:text>
<xsl:value-of select="../u:accession"/>
<xsl:text> </xsl:text>
<xsl:value-of select="@id"/>
<xsl:text> </xsl:text>
<xsl:value-of select="u:property[@type='protein sequence ID']/@value"/>
<xsl:text> </xsl:text>
<xsl:value-of select="u:property[@type='gene ID']/@value"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>



</xsl:stylesheet>

example:

$ for A in A0AVT1 A0FGR8-4 A0M8Q6 A0MZ66-2 A1L0T0 A1L4H1-2 A1X283 A2A2D0 A2A2Z9 A2ACR1; do xsltproc --novalid --stringparam query $A stylesheet.xsl "http://www.uniprot.org/uniprot/${A}.xml" ; done
A0AVT1 A0AVT1 ENST00000322244 ENSP00000313454 ENSG00000033178
A0AVT1 A0AVT1 ENST00000420827 ENSP00000399234 ENSG00000033178
A0FGR8-4 A0FGR8 ENST00000251527 ENSP00000251527 ENSG00000117868
A0MZ66-2 A0MZ66 ENST00000260777 ENSP00000260777 ENSG00000187164
A0MZ66-2 A0MZ66 ENST00000355371 ENSP00000347532 ENSG00000187164
A0MZ66-2 A0MZ66 ENST00000392903 ENSP00000376636 ENSG00000187164
A1L0T0 A1L0T0 ENST00000263383 ENSP00000263383 ENSG00000105135
A1L4H1-2 A1L4H1 ENST00000389623 ENSP00000374274 ENSG00000179954
A1X283 A1X283 ENST00000311601 ENSP00000309714 ENSG00000174705
A2A2D0 A2A2D0 ENST00000446334 ENSP00000407567 ENSG00000117632
A2A2Z9 A2A2Z9 ENST00000290943 ENSP00000290943 ENSG00000230453
A2ACR1 A2ACR1 ENST00000395330 ENSP00000378739 ENSG00000240065
A2ACR1 A2ACR1 ENST00000399371 ENSP00000382305 ENSG00000240118
A2ACR1 A2ACR1 ENST00000399607 ENSP00000382516 ENSG00000243594
A2ACR1 A2ACR1 ENST00000420182 ENSP00000404847 ENSG00000239836
A2ACR1 A2ACR1 ENST00000451862 ENSP00000401221 ENSG00000243067
ADD COMMENTlink written 7.0 years ago by Pierre Lindenbaum123k
1

While nice I don't think this is a good answer. Eats way to much bandwidth at the uniprot servers Use the id mapping service at uniprot.org/mapping instead.

ADD REPLYlink written 7.0 years ago by Jerven640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 789 users visited in the last hour