Question: Download genome annotations from UCSC's MySQL database
0
gravatar for James Ashmore
4.9 years ago by
James Ashmore3.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore3.0k wrote:

I want to download a BED file of various genome annotations (introns, exons, 3' UTR, 5' UTR) for a given assembly. I can do this through the UCSC table browser following these instructions, however I'd like to do it programmatically. Currently I get these regions from the full refGene table using MySQL to download the table and a custom script to parse out the regions into BED format. Does UCSC already offer an established way to do this?

ucsc mysql bed • 1.6k views
ADD COMMENTlink modified 4.9 years ago by Pierre Lindenbaum129k • written 4.9 years ago by James Ashmore3.0k
1
gravatar for Pierre Lindenbaum
4.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:
Yes , you can access the mysql server using a simple sql script:

$ echo -e "chr1\t10000\t20000\nchr1\t30000\t40000" |\
awk -F '\t' '{printf("select \"%s\",\"%s\",\"%s\", G.name,G.txStart,G.txEnd from refGene as G  where chrom=\"%s\" and not(%s>txEnd or %s<txStart);\n",$1,$2,$3,$1,$2,$3);}' |\
mysql --user=genome --host=genome-mysql.cse.ucsc.edu  -A  -N -D hg38

chr1    10000    20000    NR_024540    14361    29370
chr1    10000    20000    NR_107063    17368    17436
chr1    10000    20000    NR_128720    17368    17436
chr1    10000    20000    NR_106918    17368    17436
chr1    10000    20000    NR_107062    17368    17436
chr1    10000    20000    NR_046018    11873    14409
chr1    30000    40000    NR_036267    30365    30503
chr1    30000    40000    NR_036266    30365    30503
chr1    30000    40000    NR_036268    30365    30503
chr1    30000    40000    NR_026822    34610    36081
chr1    30000    40000    NR_026820    34610    36081
chr1    30000    40000    NR_026818    34610    36081
chr1    30000    40000    NR_036051    30365    30503

I also remember Heng Li wrote a tool to batch-query the UCSC: https://github.com/lh3/misc/blob/master/biodb/batchUCSC.pl

ADD COMMENTlink written 4.9 years ago by Pierre Lindenbaum129k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour