Download genome annotations from UCSC's MySQL database
1
0
Entering edit mode
8.6 years ago
James Ashmore ★ 3.4k

I want to download a BED file of various genome annotations (introns, exons, 3' UTR, 5' UTR) for a given assembly. I can do this through the UCSC table browser following these instructions, however I'd like to do it programmatically. Currently I get these regions from the full refGene table using MySQL to download the table and a custom script to parse out the regions into BED format. Does UCSC already offer an established way to do this?

UCSC MySQL BED • 2.2k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
Yes , you can access the mysql server using a simple sql script:

$ echo -e "chr1\t10000\t20000\nchr1\t30000\t40000" |\
awk -F '\t' '{printf("select \"%s\",\"%s\",\"%s\", G.name,G.txStart,G.txEnd from refGene as G  where chrom=\"%s\" and not(%s>txEnd or %s<txStart);\n",$1,$2,$3,$1,$2,$3);}' |\
mysql --user=genome --host=genome-mysql.cse.ucsc.edu  -A  -N -D hg38

chr1    10000    20000    NR_024540    14361    29370
chr1    10000    20000    NR_107063    17368    17436
chr1    10000    20000    NR_128720    17368    17436
chr1    10000    20000    NR_106918    17368    17436
chr1    10000    20000    NR_107062    17368    17436
chr1    10000    20000    NR_046018    11873    14409
chr1    30000    40000    NR_036267    30365    30503
chr1    30000    40000    NR_036266    30365    30503
chr1    30000    40000    NR_036268    30365    30503
chr1    30000    40000    NR_026822    34610    36081
chr1    30000    40000    NR_026820    34610    36081
chr1    30000    40000    NR_026818    34610    36081
chr1    30000    40000    NR_036051    30365    30503

I also remember Heng Li wrote a tool to batch-query the UCSC: https://github.com/lh3/misc/blob/master/biodb/batchUCSC.pl

ADD COMMENT

Login before adding your answer.

Traffic: 1359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6