Question

Getting Ucsc Headers For Tables Through Ftp Or Via Sql?

0

Entering edit mode

10.8 years ago

user ▴ 940

How can I fetch the headers of UCSC gene tables programmatically from ftp? different genomes have different headers. Example: kgXref.txt from hg18 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.sql) has a different schema from the goldenPath equivalent to kgXref for hg19. Since http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.txt has no header it's difficult to know what the headers are - is there an easy way to get that? does one have to parse the .sql file to get that info? if so, what are some tools to get the schema/headers out of the .sql file, which is otherwise cumbersome to parse? thank you

the solution provided by pierre is the answer.

ucsc genome-browser annotation genes • 2.7k views

ADD COMMENT • link 10.8 years ago by user ▴ 940

score 1 · Answer 1 · 2013-07-15

1

Entering edit mode

10.8 years ago

Pierre Lindenbaum 161k

you can use the public mysql server of the UCSC and the DESC statement:

 mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc knownGene'

to get a diff of the column names:

$ sdiff <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'desc kgXref' -N | cut -d '       ' -f 1) <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc kgXref' -N | cut -d '   ' -f 1)
kgID                                kgID
mRNA                                mRNA
spID                                spID
spDisplayID                            spDisplayID
geneSymbol                            geneSymbol
refseq                                refseq
protAcc                                protAcc
description                            description
                                  >    rfamAcc
                                  >    tRnaName

ADD COMMENT • link 10.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

great solution! minor question - is there a way to get the desc output in a format more standard, like csv/tsb, rather than this pretty-printed table which is hard to parse? scratch figured it out, the solution is to add -N -B to the query.

ADD REPLY • link 10.8 years ago by user ▴ 940

1

Entering edit mode

try : mysqldump --user=genome --host=genome-mysql.cse.ucsc.edu -X --skip-lock-tables -d hg19 kgXref

ADD REPLY • link 10.8 years ago by Pierre Lindenbaum 161k