Question: Getting Ucsc Headers For Tables Through Ftp Or Via Sql?
0
gravatar for user
5.8 years ago by
user790
United States
user790 wrote:

How can I fetch the headers of UCSC gene tables programmatically from ftp? different genomes have different headers. Example: kgXref.txt from hg18 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.sql) has a different schema from the goldenPath equivalent to kgXref for hg19. Since http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.txt has no header it's difficult to know what the headers are - is there an easy way to get that? does one have to parse the .sql file to get that info? if so, what are some tools to get the schema/headers out of the .sql file, which is otherwise cumbersome to parse? thank you

the solution provided by pierre is the answer.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by user790
1
gravatar for Pierre Lindenbaum
5.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

you can use the public mysql server of the UCSC and the DESC statement:

 mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc knownGene'

to get a diff of the column names:

$ sdiff <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'desc kgXref' -N | cut -d '       ' -f 1) <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc kgXref' -N | cut -d '   ' -f 1)
kgID                                kgID
mRNA                                mRNA
spID                                spID
spDisplayID                            spDisplayID
geneSymbol                            geneSymbol
refseq                                refseq
protAcc                                protAcc
description                            description
                                  >    rfamAcc
                                  >    tRnaName
ADD COMMENTlink written 5.8 years ago by Pierre Lindenbaum119k

great solution! minor question - is there a way to get the desc output in a format more standard, like csv/tsb, rather than this pretty-printed table which is hard to parse? scratch figured it out, the solution is to add -N -B to the query.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by user790
1

try : mysqldump --user=genome --host=genome-mysql.cse.ucsc.edu -X --skip-lock-tables -d hg19 kgXref

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 760 users visited in the last hour