Genomic Cordinates From Ucsc
2
7
Entering edit mode
11.1 years ago

I have a list of gene symbols.

APOB, TTC39B, ATF3, RGS1, LIPG,

I am trying to get the genomic coordinates (as in bp) with +/-5KB position of these genes via UCSC TableBrowser / MySQL server using NCBI 36/hg18 build. I have tried to get this information via TableBrowser, but I can't find the chromStart and chromEnd field in the given table. Am I missing something ?

Also, please share your favorite tutorial / docs that explain the schema/tables in UCSC MySQL server.

data human genome ucsc • 8.4k views
19
Entering edit mode
11.1 years ago

From the table browser, select group= Genes , track= UCSC gene , table=knownGene and then 'describe table schema'

You'll see that knownGene is linked to kgXref:

hg18.kgXref.kgID (via knownGene.name)


and kgXref contains a column named geneSymbol.

All in one, you can get the positions of the transcripts for those genes:

mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select distinct X.geneSymbol,G.chrom,G.txStart-5000,G.txEnd+5000 from knownGene as G, kgXref as X where X.geneSymbol in ("APOB", "TTC39B", "ATF3", "RGS1", "LIPG") and X.kgId=G.name'
+------------+-------+----------------+--------------+
| geneSymbol | chrom | G.txStart-5000 | G.txEnd+5000 |
+------------+-------+----------------+--------------+
| APOB       | chr2  |       21072805 |     21125450 |
| ATF3       | chr1  |      210843616 |    210865739 |
| ATF3       | chr1  |      210800319 |    210865739 |
| ATF3       | chr1  |      210843616 |    210865704 |
| ATF3       | chr1  |      210849982 |    210864212 |
| LIPG       | chr18 |       45337424 |     45378276 |
| LIPG       | chr18 |       45337424 |     45367217 |
| RGS1       | chr1  |      190806479 |    190820782 |
| TTC39B     | chr9  |       15156560 |     15302244 |
| TTC39B     | chr9  |       15156560 |     15227442 |
| TTC39B     | chr9  |       15171584 |     15302244 |
| TTC39B     | chr9  |       15172968 |     15268702 |
| TTC39B     | chr9  |       15172968 |     15227442 |
+------------+-------+----------------+--------------+


I agree that the information is hard to find: I only knew where to find it because I use to play regularly with those tables.

The UCSC mailing list is a good place to find this kind of information.

I also did a lot of reverse engineering by just 'greping' the flat files available from the UCSC.

2
Entering edit mode

UCSC does not make calls on whether there is or is not a UTR on either side. It is based on whatever the source record provides. (You may know this, but it's something people ask us in trainings all the time too, so I thought I'd mention it...)

1
Entering edit mode

I don't think there is any table for a gene (refGene, knownGene, ensGene, ...) where the txStart!=chromStart, and where any regulatory region in 3'/5' would have been added. You can check this within the genome browser .

0
Entering edit mode

Thanks a lot Pierre. I need the genomic coordinates (as in chromStart and chromEnd instead of txStart/txEnd). I can't find those fields in knownGene / kgXref table.

0
Entering edit mode

why do you need chromStart instead of txStart ? it is just a label after all ?

0
Entering edit mode

Pls. correct me if I am wrong. In this URL, description for txStart is given as "Transcription start position". My assumption is that a gene could have regions that are not transcribed and I will miss those region if I use txStart/txEnd. I am looking at genomic coordinates for candidate gene analysis using genotype data. I need the genomic coordinates +/- 5KB(in bp) of a given gene for this particular analysis. In one of your earlier solution using UCSC MySQL you had used chromStart field of table snp130. I am looking for that particular field here.

0
Entering edit mode

OK, Got it. Thanks a lot for clarifying this.

0
Entering edit mode

Mary, Thanks for adding your thoughts. I was a bit confused with the field names txStart v/s chromStart.

4
Entering edit mode
11.1 years ago
Mary 11k

We used to love the annotation database overview here: http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html And we used to show it to people in trainings all the time. They loved it. It was much easier to scan through and get a sense of what was available.

But it went away a few years back the upgrade that gave use the "describe table schema" button. We've been asking for a page like it again, but haven't come across it yet.