Question

Ucsc Gene Name Question

3

Entering edit mode

13.5 years ago

Austinlew ▴ 310

I used galaxy to look up the nearest gene for a given set of variation,by comparing the variation start and end location with the gene retrieved from Knowngene table in UCSC browser, however I can only get the name as uc001aaa.3 how can I convert this UCSC ID into the ordinary gene symbol?

Thanks!

galaxy ucsc gene • 10k views

ADD COMMENT • link updated 7.9 years ago by chen ★ 2.5k • written 13.5 years ago by Austinlew ▴ 310

score 9 · Answer 1 · 2010-11-01

9

Entering edit mode

13.5 years ago

Mary 11k

You probably need the known gene cross-reference table, aka: kgXref

But what do you mean by "ordinary gene name"? Is that a symbol, full name, or description? Official from HGNC, or some other source? Might need another linked kg table. But I'd bet money the one you want is in there. The same Galaxy query of UCSC ought to be able to give you that.

ADD COMMENT • link 13.5 years ago by Mary 11k

0

Entering edit mode

Hi,Mary, Thanks a lot! I mean gene symbol such as BRCA1, you are right on that , I will try to query kgXref table.

ADD REPLY • link 13.5 years ago by Austinlew ▴ 310

Ram · Answer 2 · 2010-11-02

7

Entering edit mode

13.5 years ago

Pierre Lindenbaum 161k

You can use the UCSC mysql server with the table kgXref:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e 'select geneSymbol from kgXref where kgId="uc001aaa.3"'
+------------+
| geneSymbol |
+------------+
| DDX11L1    | 
+------------+

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.5 years ago by Pierre Lindenbaum 161k

Ram · Answer 3 · 2010-11-01

You can use BioMart for this conversion. Select 'ID List Limit' under filters, and pick UCSC ID form the drop down, then you can paste your identifiers into the box, or upload a file containing them. Pick the outputs you want from the 'Attributes' section.

I exported a query for the gene name you give in your question as Perl code, which will allow you to script the retrieval if you like:

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5 
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
                        http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration  
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

    $query->setDataset("hsapiens_gene_ensembl");
    $query->addFilter("ucsc", ["uc001aaa.3"]);
    $query->addAttribute("ensembl_gene_id");
    $query->addAttribute("ensembl_transcript_id");
    $query->addAttribute("external_gene_id");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################

############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

This is a C&P direct from the BioMart website, YMMV.

score 0 · Answer 4 · 2016-06-20

OpenGene (https://github.com/OpenGene/OpenGene.jl) can do this very easily. The gencode_locate function will query gencode database to find whicn gene, and which exon/intron the position in.

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")

# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
#  Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")