Question: Ucsc Gene Name Question
3
gravatar for Austinlew
7.3 years ago by
Austinlew260
Austinlew260 wrote:

I used galaxy to look up the nearest gene for a given set of variation,by comparing the variation start and end location with the gene retrieved from Knowngene table in UCSC browser, however I can only get the name as uc001aaa.3 how can I convert this UCSC ID into the ordinary gene symbol?

Thanks!

gene galaxy ucsc • 6.3k views
ADD COMMENTlink modified 20 months ago by chen1.5k • written 7.3 years ago by Austinlew260
9
gravatar for Mary
7.3 years ago by
Mary11k
Boston MA area
Mary11k wrote:

You probably need the known gene cross-reference table, aka: kgXref

But what do you mean by "ordinary gene name"? Is that a symbol, full name, or description? Official from HGNC, or some other source? Might need another linked kg table. But I'd bet money the one you want is in there. The same Galaxy query of UCSC ought to be able to give you that.

ADD COMMENTlink written 7.3 years ago by Mary11k

Hi,Mary, Thanks a lot! I mean gene symbol such as BRCA1, you are right on that , I will try to query kgXref table.

ADD REPLYlink written 7.3 years ago by Austinlew260
7
gravatar for Pierre Lindenbaum
7.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum104k wrote:

You can use the UCSC mysql server with the table 'kgXref':

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e 'select geneSymbol from kgXref where kgId="uc001aaa.3"'
+------------+
| geneSymbol |
+------------+
| DDX11L1    | 
+------------+
ADD COMMENTlink written 7.3 years ago by Pierre Lindenbaum104k
2
gravatar for Simon Cockell
7.3 years ago by
Simon Cockell7.3k
Newcastle
Simon Cockell7.3k wrote:

You can use BioMart for this conversion. Select 'ID List Limit' under filters, and pick UCSC ID form the drop down, then you can paste your identifiers into the box, or upload a file containing them. Pick the outputs you want from the 'Attributes' section.

I exported a query for the gene name you give in your question as Perl code, which will allow you to script the retrieval if you like:

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5 
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
                        http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration  
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

    $query->setDataset("hsapiens_gene_ensembl");
    $query->addFilter("ucsc", ["uc001aaa.3"]);
    $query->addAttribute("ensembl_gene_id");
    $query->addAttribute("ensembl_transcript_id");
    $query->addAttribute("external_gene_id");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################

############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

This is a C&P direct from the BioMart website, YMMV.

ADD COMMENTlink written 7.3 years ago by Simon Cockell7.3k
1

an example would help, but you're relying on a lot of people's data being in sync here. Mapping is always going to be a thorny issue, and imperfect in most ordinary scenarios.

ADD REPLYlink written 7.3 years ago by Simon Cockell7.3k

Dear Simon, Thanks for helping me out, It worked! Great!

ADD REPLYlink written 7.3 years ago by Austinlew260

Hi,Simon. I just tried query all the ucsc gene id, but some (about 10%) cound not be found, do you have any idea about this? Thanks!

ADD REPLYlink written 7.3 years ago by Austinlew260

Some ucsc id failed like uc011kao.1 uc011gan.1

I think UCSC table might be better as for the consistence. I will try the method Mary pointed out.

Thanks again!

ADD REPLYlink written 7.3 years ago by Austinlew260
0
gravatar for chen
20 months ago by
chen1.5k
OpenGene
chen1.5k wrote:

OpenGene (https://github.com/OpenGene/OpenGene.jl) can do this very easily. The gencode_locate function will query gencode database to find whicn gene, and which exon/intron the position in.

using OpenGene, OpenGene.Reference

# load the gencode dataset, it will download a file from gencode website if it not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")

# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
#  Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")
ADD COMMENTlink modified 20 months ago • written 20 months ago by chen1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 917 users visited in the last hour