Convert Refseq Id To Gene Name
6
4
Entering edit mode
11.4 years ago

I'm trying to convert a list of RefSeq IDs to the Gene Symbol. I can do it for Ensembl using http://genome.ucsc.edu/cgi-bin/hgTables [Track -> Ensembl Genes : Table -> ensemblToGeneName]

I can import a list like:

ENSMUST00000000219
ENSMUST00000000450
ENSMUST00000001156
ENSMUST00000001319
ENSMUST00000001559

and get a table that looks like this:

ENSMUST00000000219    Th
ENSMUST00000000450    Pparg
ENSMUST00000001156    Cfp
ENSMUST00000001319    Efnb2
ENSMUST00000001559    Itfg2

and life is super easy. I can't figure out how to do something similar with RefSeq IDs!

I tried using http://idconverter.bioinfo.cnio.es/IDconverter.php which worked the best out of all the converters suggested by http://www.shodhaka.com/cgi-bin/startbioinfo/simpleresources.pl?tn=Gene%20ID%20conversion&sort=Rank%20by%20usage%20frequency but it wasn't recognizing some of the transcripts and its really annoying

Does anyone know how to import a list of RefSeq genes:

NM_001081045
NM_027801
NM_001267620
NM_028121
NM_001167748

and get out the Gene Symbols:

Kansl1
2610015P09Rik
Ankzf1
Adpgk
Egfem1
refseq gene • 49k views
ADD COMMENT
1
Entering edit mode

Please search this site for the many similar questions and answers, which explain how to use BioMart.

ADD REPLY
0
Entering edit mode
ADD REPLY
5
Entering edit mode
11.4 years ago

Try:

$ mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e "select name,name2 from refGene" > Refseq2Gene.txt

This will give u the mapping file. For mouse replace hg19 with mm10.

ADD COMMENT
0
Entering edit mode

Copy and paste in excel and do VLOOKUP for your genes.

ADD REPLY
10
Entering edit mode

10 out of 10 scientists agree: don't use Excel!

ADD REPLY
1
Entering edit mode

Huh!! Its funny that they even have a paper about it.

ADD REPLY
0
Entering edit mode

Do biologists not classify as scientists then?

ADD REPLY
4
Entering edit mode

Of course we do, which is why we don't use Excel :).

ADD REPLY
0
Entering edit mode

You know the old adage ... if it doesn't have the word "science" in the title, then it's not a real science.

Computer Science all the way, baby ... wooo hoooo!

Oh ... no, wait ...

ADD REPLY
0
Entering edit mode

i'm sorry but I get really confused when trying to use Open Source databases through Terminal on my Mac. can you direct me towards somewhere where I can learn ?

ADD REPLY
0
Entering edit mode

Hi Ashutosh, I don't know if you can get my message but I have a question for you. Your code for retrieving mapping file from human refGene database only gives locus and gene symbol. How do I get gi number and reseq protein number from it? Thank you.

ADD REPLY
0
Entering edit mode

How can I do that for UCSC Genes instead of refseq Genes?

Thank you

ADD REPLY
3
Entering edit mode
11.4 years ago
vaskin90 ▴ 290

You could try bioDBnet converter: http://biodbnet.abcc.ncifcrf.gov/db/db2db.php

ADD COMMENT
3
Entering edit mode
ADD COMMENT
2
Entering edit mode
11.4 years ago

An alternate way would be to go to

1) http://genome.ucsc.edu/cgi-bin/hgTables?command=start

2) Select your genome and assembly and selct Genes and Gene Prediction track as group.

3) Select Refseq Genes as track

4) Select refGene as a table and then output the file.

Then you can use a script or excel to map your refseqids to gene names. Make sure you follow what Steve mentioned in the comment section. Also, have u ever used DAVID (http://david.abcc.ncifcrf.gov/conversion.jsp)

ADD COMMENT
0
Entering edit mode

Upvote for this answer. Using the table browser is better than the biomart if you have a huge number of IDs to be converted.

ADD REPLY
1
Entering edit mode
11.3 years ago
plaschkej ▴ 10

Here is a bioperl script

#!/bin/perl
use warnings;
use strict;
use Bio::Perl;
$| = 1;

my $db = new Bio::DB::RefSeq;

print "Input RefSeq ID: ";
my $refseq = <STDIN>;
chomp($refseq);

my $seq = get_sequence('refseq',$refseq);

# most of the time RefSeq_ID eq RefSeq acc
#my $seq = $db->get_Seq_by_id($refseq); # RefSeq ID
#print "accession is ", $seq->accession_number, "\n";

if ($seq->desc =~ /\((\w+)\)/) {
    print"found: $1\n";
    print $seq->desc;
}
else
{
    print "defintion is ", $seq->desc, "\n";
}
ADD COMMENT
0
Entering edit mode

Hi, I am not perl person - would be possible to change this script to paste LIST of NM_numbers instead of typing to STDIN in cmd??

ADD REPLY
0
Entering edit mode
11.3 years ago
Ming Tommy Tang ★ 4.3k

Cistrome/Galaxy can do it very easily. http://cistrome.org/ap/ In the tool box on the left:

Liftover/Others Convert between RefSeq, Gene Symbols to Entrez IDs using Bioconductor. Liftover Wig Files Liftover wig files [Galaxy]Convert genome coordinates between assemblies and genomes Standardize wig file Standardize a wig file to a given span Extract data from Wiggle Extract data for certain chromosome from a wiggle file Extract data from Bed Extract data for certain chromosome from a BED file

ADD COMMENT

Login before adding your answer.

Traffic: 1155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6