Question: Getting Sequence Information From Ucsc Genome Browser
1
gravatar for anuragm
7.1 years ago by
anuragm130
India
anuragm130 wrote:

I had downloaded the Phast cons score for Xenopus tropicalis alignment with other vertebrate species to look for conserved regions. So, now I have the positions that I am interested in. How do I get the exact nucleotides corresponding to these positions now ?

nucleotide genome-browser • 2.4k views
ADD COMMENTlink modified 7.1 years ago by Alex Reynolds30k • written 7.1 years ago by anuragm130
3
gravatar for Alex Reynolds
7.1 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

A DAS query can help with automation.

For example, to write the human (hg19) sequence for a region on chromosome chrX at positions 1000000-1000010 to a file called foo.xml:

$ wget -O - http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chrX:1000000,1000010 > foo.xml

The XML looks like this:


http://www.biodas.org/dtd/dasdna.dtd">
<DASDNA>
<SEQUENCE id="chrX" start="1000000" stop="1000010" version="1.00">
<DNA length="11">
gaaacagctac
</DNA>
</SEQUENCE>
</DASDNA>

You can parse this on the command line, using an XSLT stylesheet and xsltproc.

First, create the stylesheet that retrieves the value of data in the sequence path; for example, foo.xsl:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0'>
  <xsl:output method="text" encoding="UTF-8"/>
  <xsl:template match="/">
    <xsl:value-of select="DASDNA/SEQUENCE/DNA"/>
  </xsl:template>
</xsl:stylesheet>

Then run the foo.xml result against this stylesheet:

$ xsltproc foo.xsl foo.xml | awk '($0 ~ /^[acgtnACGTN]/)'
gaaacagctac

You can glue some of this into a pipeline or shell script:

#!/bin/bash -efx

DASURL="http://genome.ucsc.edu/cgi-bin/das"
BUILD="hg19"
CHR="chrX"
START="1000000"
STOP="1000010"

wget -O - ${DASURL}/${BUILD}/dna?segment=${CHR}:${START},${STOP} \
    | xsltproc foo.xsl - \
    | awk '($0 ~ /^[acgtnACGTN]/)' \
    > foo.txt
ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Alex Reynolds30k
1
gravatar for Sean Davis
7.1 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

The high-level view may be enough to get you going.

  1. Convert your regions of interest into BED format
  2. Upload your BED file to the UCSC genome browser as a custom track
  3. Use the UCSC Table Browser, choose your custom track as the track of interest, then choose output "sequence"
ADD COMMENTlink written 7.1 years ago by Sean Davis26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour