Question: How to get protein names, gene symbols, refseq numbers, gi numbers... using UCSC mysql?
0
gravatar for grayapply2009
3.9 years ago by
grayapply2009170
United States
grayapply2009170 wrote:

I want to fetch some information (protein names, gene symbols, refseq numbers, gi numbers...) from human refseq protein database using mysql. I found this code:

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e "select name,name2 from refGene" > Refseq2Gene.txt

This code only gives me locus information and gene symbols.

How do I modify this code to get other information I need?

ucsc mysql • 2.2k views
ADD COMMENTlink modified 3.9 years ago by biocyberman760 • written 3.9 years ago by grayapply2009170
1

You might want to try Ensembl, they offer MySQL access as well as a Perl API and Biomart.

ADD REPLYlink written 3.9 years ago by pld4.8k

How do I do that? Sorry, I'm new to these things.

ADD REPLYlink written 3.9 years ago by grayapply2009170
1

They have a very detailed help section as well as video tutorials on youtube.

ADD REPLYlink written 3.9 years ago by pld4.8k
3
gravatar for biocyberman
3.9 years ago by
biocyberman760
Denmark
biocyberman760 wrote:

I would use biomart for this purpose. Here is the biomart query URL: http://www.ensembl.org/biomart/martview/95b00a8c74fb4a574b028b0f90f51d76

And its Perl rendition: http://www.ensembl.org/biomart/martview/95b00a8c74fb4a574b028b0f90f51d76

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5 
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
						http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration  
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

		
	$query->setDataset("hsapiens_gene_ensembl");
	$query->addAttribute("ensembl_gene_id");
	$query->addAttribute("external_gene_name");
	$query->addAttribute("refseq_peptide");
	$query->addAttribute("entrezgene");
	$query->addAttribute("refseq_mrna");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################


############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################

 

But if you want to stick with UCSC's mysql interface, it is helpful to explore its table structure and make a union/join query across multiple tables to get what you want. For example: refLink table and gbCdnaInfo have pointers to some of fields you want:

Database: hg19    Primary Table: refLink    Row Count: 435,137   Data last updated: 2015-05-30
Format description: Link together a refseq mRNA and other stuff

field example SQL type info description
name   varchar(255) values Name displayed in UI
product   varchar(255) values Name of protein product
mrnaAcc NR_025344 varchar(255) values mRNA accession
protAcc   varchar(255) values protein accession
geneName 0 int(10) unsigned range pointer to geneName table
prodName 0 int(10) unsigned range pointer to prodName table
locusLinkId 0 int(10) unsigned range Entrez ID (formerly LocusLink ID)
omimId 0 int(10) unsigned range OMIM ID

 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by biocyberman760

Thanks, my friend! It'll take me some time to figure it out.

ADD REPLYlink written 3.9 years ago by grayapply2009170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1909 users visited in the last hour