Question: How to map UCSC transcripts to gene symbol?
0
gravatar for wenbinm
16 months ago by
wenbinm10
USA
wenbinm10 wrote:

Hi there,

I would like to map UCSC transcripts id (mouse genome mm10, I downloaded the transcripts id from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/refMrna.fa.gz) to gene symbol. I have a list of transcript id like 'NR_046233 2' and want to get a list of corresponding gene symbols.

Dose anyone know how to map each transcript id to gene symbol?

Thank you!

rna-seq assembly genome • 1.4k views
ADD COMMENTlink modified 16 months ago by Alex Reynolds29k • written 16 months ago by wenbinm10
1
gravatar for h.mon
16 months ago by
h.mon28k
Brazil
h.mon28k wrote:

One option: use DAVID conversion tool: https://david.ncifcrf.gov/conversion.jsp, select OFFICIAL_GENE_SYMBOL.

Another option: use R, with the AnnotationDbi and org.Mm.eg.db packages

library( AnnotationDbi )
library( org.Mm.eg.db )
geneSymbol <- select( org.Mm.eg.db, keys = "NR_000002",
                      columns = "SYMBOL",  keytype = "REFSEQ" )
ADD COMMENTlink written 16 months ago by h.mon28k

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLYlink written 15 months ago by wenbinm10
0
gravatar for Alex Reynolds
16 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Another option is to use MyGene (modified from this excellent answer):

#!/usr/bin/env python

import sys
import mygene

ids = set()
with open('genes.txt', 'r') as f:
    for line in f:
        id = line.rstrip()
        ids.add(id)

m = mygene.MyGeneInfo()
r = m.querymany(list(ids),
                scopes='refseq',
                fields='symbol',
                species='mouse',
                as_dataframe=False)

for e in r:
    sys.stdout.write("%s\t%s\n" % (e['query'], e['symbol']))

Given a test file called genes.txt containing:

NR_046233

The output looks like:

NR_046233       Rn45s
ADD COMMENTlink modified 16 months ago • written 16 months ago by Alex Reynolds29k

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLYlink written 15 months ago by wenbinm10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1773 users visited in the last hour