Question: Ucsc Mysql Query
gravatar for Radhouane Aniba
9.6 years ago by
Radhouane Aniba760 wrote:

Hi everyone,

I have to deal with task over UCSC genome browser but I want to do it via MySQL public server connection, I am using Ubuntu and I installed MySQL Client and already connected to UCSC, but once I showed the tables description I was a little bit lost, here is what I want to do

1. Click the "Tables" link on the main page or in
    the blue bar at the top of the genome browser page.
2. Select the organism (Human/Chimp) and assembly date of interest.
3. Select "Genes and Gene Prediction Tracks" as the group and one
    of the genes tracks (e.g. Known Genes or RefSeq genes).
4. Select genome as the region.
5. Select "sequence" as the output format.
6. If desired -- enter a filename where the data should be saved
    locally and compress with gzip.
7. Click the "Get Output" button.
8. Select genomic and click "Submit".
9. Select only 5' UTR (or 3' UTR) Exons and click "Get Sequence".

How can I convert this small process into an SQL query ?


ucsc • 6.9k views
ADD COMMENTlink written 9.6 years ago by Radhouane Aniba760

@Rahouane: you've already asked more than 20 questions on this site but you didn't bother to validate most of the people's answers. Please, take a few minutes to validate the answers, thanks.

ADD REPLYlink written 9.6 years ago by Pierre Lindenbaum133k
gravatar for Pierre Lindenbaum
9.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

you cannot retrieve a genomic sequence of DNA only with a SQL query. You can retrieve all the the 5' and 3' UTR using the following command:

mysql -h -u genome -D hg19 -N -A -e 'select chrom,strand, txStart,cdsStart from knownGene where txStart< cdsStart union select chrom,strand,cdsEnd,txEnd from knownGene where cdsEnd&lt; txEnd ' > utrs.txt

and then loop over those regions to get the genomic sequences: see How To Get The Sequence Of A Genomic Region From Ucsc?

ADD COMMENTlink modified 16 months ago by _r_am32k • written 9.6 years ago by Pierre Lindenbaum133k

Pierre thank you for your answer : I want only the 3' UTR how to modify the query, can u explain a little bit by splitting it and commenting it ? I appreciate Rad

ADD REPLYlink written 9.6 years ago by Radhouane Aniba760

Got it pierre I understood the query, thank you

ADD REPLYlink written 9.6 years ago by Radhouane Aniba760

you can now validate this answer.... :-)

ADD REPLYlink written 9.6 years ago by Pierre Lindenbaum133k
gravatar for brentp
8.2 years ago by
Salt Lake City, UT
brentp23k wrote:

You can do this with cruzdb. The following will print out a fasta file with the 3' utr sequences:

from cruzdb import Genome, sequence
database = 'hg19'

g = Genome(database)
refGene = g.refGene

tbl = refGene.table()
query = refGene.filter(tbl.c.cdsEnd < tbl.c.txEnd, tbl.c.cdsStart != tbl.c.cdsEnd)

for transcript in query:
    start, end = transcript.utr3
    if None in (start, end): continue
    print ">%s\n%s" % (transcript.name2, sequence.sequence(database,
        transcript.chrom, start, end))
ADD COMMENTlink written 8.2 years ago by brentp23k

Now it gave me error:

In[8]: tbl = refGene.table()
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/IPython/core/", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
  File "<ipython-input-8-17e93216ae17>", line 1, in <module>
    table = refGene.table()
  File "/usr/local/lib/python2.7/dist-packages/cruzdb/", line 78, in __getattr__
    return getattr(cls._query, attr)
AttributeError: 'Query' object has no attribute 'table'
ADD REPLYlink modified 16 months ago by _r_am32k • written 6.4 years ago by Ming Tang2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2512 users visited in the last hour