Question: How to map UCSC transcripts to gene symbol?
0
gravatar for wenbinm
2.1 years ago by
wenbinm20
USA
wenbinm20 wrote:

Hi there,

I would like to map UCSC transcripts id (mouse genome mm10, I downloaded the transcripts id from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/refMrna.fa.gz) to gene symbol. I have a list of transcript id like 'NR_046233 2' and want to get a list of corresponding gene symbols.

Dose anyone know how to map each transcript id to gene symbol?

Thank you!

rna-seq assembly genome • 2.1k views
ADD COMMENTlink modified 2.1 years ago by Alex Reynolds30k • written 2.1 years ago by wenbinm20
1
gravatar for h.mon
2.1 years ago by
h.mon31k
Brazil
h.mon31k wrote:

One option: use DAVID conversion tool: https://david.ncifcrf.gov/conversion.jsp, select OFFICIAL_GENE_SYMBOL.

Another option: use R, with the AnnotationDbi and org.Mm.eg.db packages

library( AnnotationDbi )
library( org.Mm.eg.db )
geneSymbol <- select( org.Mm.eg.db, keys = "NR_000002",
                      columns = "SYMBOL",  keytype = "REFSEQ" )
ADD COMMENTlink written 2.1 years ago by h.mon31k

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLYlink written 2.1 years ago by wenbinm20
0
gravatar for Alex Reynolds
2.1 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Another option is to use MyGene (modified from this excellent answer):

#!/usr/bin/env python

import sys
import mygene

ids = set()
with open('genes.txt', 'r') as f:
    for line in f:
        id = line.rstrip()
        ids.add(id)

m = mygene.MyGeneInfo()
r = m.querymany(list(ids),
                scopes='refseq',
                fields='symbol',
                species='mouse',
                as_dataframe=False)

for e in r:
    sys.stdout.write("%s\t%s\n" % (e['query'], e['symbol']))

Given a test file called genes.txt containing:

NR_046233

The output looks like:

NR_046233       Rn45s
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Alex Reynolds30k

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLYlink written 2.1 years ago by wenbinm20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1465 users visited in the last hour