Question: retrieving the 3' UTR sequence from the UCSC genome browser via DAS server
0
gravatar for bagnacan
22 months ago by
bagnacan0
Rostock
bagnacan0 wrote:

I'm trying to retrieve the 3'UTR sequence of a given RefSeq id from the UCSC genome browser.
From its main web interface, this task can be easily accomplished selecting the "refGene" table and "RefSeq" track. However I'm trying to automate the retrieval process, so that the final user will not interact with the UCSC's web interface directly, but rather launch a script to obtain the corresponding 3'UTR sequence of some provided RefSeq identifier.
The UCSC genome browser web interface is therefore off the table.

Looking around I found the possibility to interact with the UCSC via MySQL queries, but since I'm only interested in the 3'UTR sequence, and since I cannot retrieve a genomic sequence of DNA only with a SQL query, I'm relying on the DAS server, as pointed out in this answer.

I therefore obtained an example 3'UTR sequence from the UCSC web interface, and tested if I could retrieve the same sequence through the DAS server:

  • when retrieved from the UCSC web interface, the RefSeq id "NM_005504" is associated with a 7985 nt long 3'UTR FASTA sequence carrying the header

    >hg19_refGene_NM_005504 range=chr12:24962958-24970941 5'pad=0 3'pad=0 strand=- repeatMasking=none

  • the range "chr12:24962958-24970941" from the previous header can be retrieved from UCSC's hg19 table through the MySQL query

    select g.chrom, g.txStart, g.cdsStart from refGene g, knownToRefSeq r where g.name = 'NM_005504' AND r.value = g.name;

  • I now reuse this information to query the DAS server, and test if I can retrieve the same 3'UTR sequence:

    http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr12:24962957,24970941

    and I obtain a 7985 nt long sequence.

The two retrieval methods return indeed a sequence of the same length, but the two do not coincide.
What am I missing?

ucsc mysql • 1.3k views
ADD COMMENTlink modified 22 months ago by genecats.ucsc560 • written 22 months ago by bagnacan0

You should probably consider the fact that DAS server is using +1 index for the first base, as mentioned in the link you posted.

ADD REPLYlink written 22 months ago by aka001190

Thanks. You're right, I forgot to mention I tried to shift the 7985 nt long window, but the issue remains: considering the DAS server indexing from 1 instead of 0, I modified the range that I retrieved through the MySQL query from 24962958-24970941 to 24962959-24970942, but the sequences still do not coincide

ADD REPLYlink written 22 months ago by bagnacan0

when retrieved from the UCSC web interface, the RefSeq id "NM_005104" is associated with a 7985 nt long 3'UTR FASTA sequence carrying the header

hg19_refGene_NM_005504 range=chr12:24962958-24970941 5'pad=0 3'pad=0 strand=- repeatMasking=none

If I search hg19 build with NM_005104 at UCSC I am getting a completely different region on chr 6. But if I use NM_005504 then that refers to chr12. Looks like there are two separate accession #.

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax68k

Yes, sorry, it was a typo in my explanation: the accession number whose sequence I was trying to retrieve was the NM_005504, not NM_005104. I edited my post to remove the typo. Thank you!

ADD REPLYlink written 22 months ago by bagnacan0
1
gravatar for genecats.ucsc
22 months ago by
genecats.ucsc560
genecats.ucsc560 wrote:

The issue here is that the transcript you are interested in lies on the negative strand, and so the Table Browser returns a reverse complemented sequence compared to what you are seeing from DAS, which is a simple genomic query. If you reverse complement one of the sequences you will find that they are the same.

Please note that you can use the utility faRc from our directory of utilities to reverse complement the sequence: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/

ChrisL from the UCSC Genome Browser

ADD COMMENTlink written 22 months ago by genecats.ucsc560

Thank you, this solved the issue

ADD REPLYlink written 22 months ago by bagnacan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour