Genomic Cordinates From Ucsc
2
7
Entering edit mode
13.5 years ago

I have a list of gene symbols.

APOB, TTC39B, ATF3, RGS1, LIPG,

I am trying to get the genomic coordinates (as in bp) with +/-5KB position of these genes via UCSC TableBrowser / MySQL server using NCBI 36/hg18 build. I have tried to get this information via TableBrowser, but I can't find the chromStart and chromEnd field in the given table. Am I missing something ?

Also, please share your favorite tutorial / docs that explain the schema/tables in UCSC MySQL server.

Thanks in advance.

data human genome ucsc • 9.8k views
ADD COMMENT
19
Entering edit mode
13.5 years ago

From the table browser, select group= Genes , track= UCSC gene , table=knownGene and then 'describe table schema'

You'll see that knownGene is linked to kgXref:

hg18.kgXref.kgID (via knownGene.name)

and kgXref contains a column named geneSymbol.

All in one, you can get the positions of the transcripts for those genes:

mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select distinct X.geneSymbol,G.chrom,G.txStart-5000,G.txEnd+5000 from knownGene as G, kgXref as X where X.geneSymbol in ("APOB", "TTC39B", "ATF3", "RGS1", "LIPG") and X.kgId=G.name'
+------------+-------+----------------+--------------+
| geneSymbol | chrom | G.txStart-5000 | G.txEnd+5000 |
+------------+-------+----------------+--------------+
| APOB       | chr2  |       21072805 |     21125450 |
| ATF3       | chr1  |      210843616 |    210865739 |
| ATF3       | chr1  |      210800319 |    210865739 |
| ATF3       | chr1  |      210843616 |    210865704 |
| ATF3       | chr1  |      210849982 |    210864212 |
| LIPG       | chr18 |       45337424 |     45378276 |
| LIPG       | chr18 |       45337424 |     45367217 |
| RGS1       | chr1  |      190806479 |    190820782 |
| TTC39B     | chr9  |       15156560 |     15302244 |
| TTC39B     | chr9  |       15156560 |     15227442 |
| TTC39B     | chr9  |       15171584 |     15302244 |
| TTC39B     | chr9  |       15172968 |     15268702 |
| TTC39B     | chr9  |       15172968 |     15227442 |
+------------+-------+----------------+--------------+

I agree that the information is hard to find: I only knew where to find it because I use to play regularly with those tables.

The UCSC mailing list is a good place to find this kind of information.

I also did a lot of reverse engineering by just 'greping' the flat files available from the UCSC.

ADD COMMENT
2
Entering edit mode

UCSC does not make calls on whether there is or is not a UTR on either side. It is based on whatever the source record provides. (You may know this, but it's something people ask us in trainings all the time too, so I thought I'd mention it...)

ADD REPLY
1
Entering edit mode

I don't think there is any table for a gene (refGene, knownGene, ensGene, ...) where the txStart!=chromStart, and where any regulatory region in 3'/5' would have been added. You can check this within the genome browser .

ADD REPLY
0
Entering edit mode

Thanks a lot Pierre. I need the genomic coordinates (as in chromStart and chromEnd instead of txStart/txEnd). I can't find those fields in knownGene / kgXref table.

ADD REPLY
0
Entering edit mode

why do you need chromStart instead of txStart ? it is just a label after all ?

ADD REPLY
0
Entering edit mode

Pls. correct me if I am wrong. In this URL, description for txStart is given as "Transcription start position". My assumption is that a gene could have regions that are not transcribed and I will miss those region if I use txStart/txEnd. I am looking at genomic coordinates for candidate gene analysis using genotype data. I need the genomic coordinates +/- 5KB(in bp) of a given gene for this particular analysis. In one of your earlier solution using UCSC MySQL you had used chromStart field of table snp130. I am looking for that particular field here.

ADD REPLY
0
Entering edit mode

OK, Got it. Thanks a lot for clarifying this.

ADD REPLY
0
Entering edit mode

Mary, Thanks for adding your thoughts. I was a bit confused with the field names txStart v/s chromStart.

ADD REPLY
4
Entering edit mode
13.5 years ago
Mary 11k

We used to love the annotation database overview here: http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html And we used to show it to people in trainings all the time. They loved it. It was much easier to scan through and get a sense of what was available.

But it went away a few years back the upgrade that gave use the "describe table schema" button. We've been asking for a page like it again, but haven't come across it yet.

ADD COMMENT

Login before adding your answer.

Traffic: 2846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6