Ucsc Gene Name Question
4
3
Entering edit mode
13.5 years ago
Austinlew ▴ 310

I used galaxy to look up the nearest gene for a given set of variation,by comparing the variation start and end location with the gene retrieved from Knowngene table in UCSC browser, however I can only get the name as uc001aaa.3 how can I convert this UCSC ID into the ordinary gene symbol?

Thanks!

galaxy ucsc gene • 10k views
ADD COMMENT
9
Entering edit mode
13.5 years ago
Mary 11k

You probably need the known gene cross-reference table, aka: kgXref

But what do you mean by "ordinary gene name"? Is that a symbol, full name, or description? Official from HGNC, or some other source? Might need another linked kg table. But I'd bet money the one you want is in there. The same Galaxy query of UCSC ought to be able to give you that.

ADD COMMENT
0
Entering edit mode

Hi,Mary, Thanks a lot! I mean gene symbol such as BRCA1, you are right on that , I will try to query kgXref table.

ADD REPLY
7
Entering edit mode
13.5 years ago

You can use the UCSC mysql server with the table kgXref:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e 'select geneSymbol from kgXref where kgId="uc001aaa.3"'
+------------+
| geneSymbol |
+------------+
| DDX11L1    | 
+------------+
ADD COMMENT
2
Entering edit mode
13.5 years ago

You can use BioMart for this conversion. Select 'ID List Limit' under filters, and pick UCSC ID form the drop down, then you can paste your identifiers into the box, or upload a file containing them. Pick the outputs you want from the 'Attributes' section.

I exported a query for the gene name you give in your question as Perl code, which will allow you to script the retrieval if you like:

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5 
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
                        http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration  
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

    $query->setDataset("hsapiens_gene_ensembl");
    $query->addFilter("ucsc", ["uc001aaa.3"]);
    $query->addAttribute("ensembl_gene_id");
    $query->addAttribute("ensembl_transcript_id");
    $query->addAttribute("external_gene_id");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################

############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

This is a C&P direct from the BioMart website, YMMV.

ADD COMMENT
1
Entering edit mode

an example would help, but you're relying on a lot of people's data being in sync here. Mapping is always going to be a thorny issue, and imperfect in most ordinary scenarios.

ADD REPLY
0
Entering edit mode

Dear Simon, Thanks for helping me out, It worked! Great!

ADD REPLY
0
Entering edit mode

Hi,Simon. I just tried query all the ucsc gene id, but some (about 10%) cound not be found, do you have any idea about this? Thanks!

ADD REPLY
0
Entering edit mode

Some ucsc id failed like uc011kao.1 uc011gan.1

I think UCSC table might be better as for the consistence. I will try the method Mary pointed out.

Thanks again!

ADD REPLY
0
Entering edit mode
7.8 years ago
chen ★ 2.5k

OpenGene (https://github.com/OpenGene/OpenGene.jl) can do this very easily. The gencode_locate function will query gencode database to find whicn gene, and which exon/intron the position in.

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")

# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
#  Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")
ADD COMMENT

Login before adding your answer.

Traffic: 2188 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6