Question

How To Get The Sequence Of A Genomic Region From Ucsc?

18

Entering edit mode

14.7 years ago

Giovanni M Dall'Olio 28k

Let's say I want to download the fasta sequence of the region chr1:100000..200000 from the UCSC browser.

How do you do that? I can't find a button to 'export to fasta' in the UCSC genome browser. I think that the solution is to click on one of the tracks displayed, but I am not sure of which.

If I go to the Tables section, I can't find a table with the fasta sequences among the many.

ucsc fasta sequence • 96k views

ADD COMMENT • link updated 12 months ago by Ram 44k • written 14.7 years ago by Giovanni M Dall'Olio 28k

Istvan Albert · Answer 1 · 2010-02-25

32

Entering edit mode

14.7 years ago

Pierre Lindenbaum 164k

Use the DAS server:

http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,200000

ADD COMMENT • link updated 12.2 years ago by Istvan Albert 101k • written 14.7 years ago by Pierre Lindenbaum 164k

5

Entering edit mode

be careful: the DAS server uses an index of (+1) for the first base.

ADD REPLY • link 14.6 years ago by Pierre Lindenbaum 164k

1

Entering edit mode

that's really neat!

ADD REPLY • link 14.7 years ago by Istvan Albert 101k

1

Entering edit mode

Thanks!! I didn't know that, very cool!! :-)

ADD REPLY • link 14.7 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

Wow, thanks. That's quite useful.

ADD REPLY • link 14.6 years ago by Madelaine Gogol 5.3k

score 16 · Answer 2 · 2010-03-04

16

Entering edit mode

14.6 years ago

Madelaine Gogol 5.3k

Just click "DNA" at the top of the screen.

ADD COMMENT • link 14.6 years ago by Madelaine Gogol 5.3k

2

Entering edit mode

Currently, you need to go to "View --> DNA," but the function is the same

ADD REPLY • link 10.4 years ago by Charles Warden 8.3k

score 9 · Answer 3 · 2014-02-13

9

Entering edit mode

10.7 years ago

Maximilian Haeussler ★ 1.7k

There is no table with sequences. The sequences are in a file because that's a lot faster.

I think the question was mostly about how to get a single sequence and the answer to that is very simple: click View - DNA.

If you need to get the sequence from a script, use the UCSC utility twoBitToFa (see http://hgdownload.cse.ucsc.edu/admin/exe/) like this:

wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa
chmod a+x twoBitToFa
twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit test.fa -seq=chr21 -start=1 -end=10000

ADD COMMENT • link 6.5 years ago by Maximilian Haeussler ★ 1.7k

0

Entering edit mode

This is exactly what I was looking for Thanks!!!

ADD REPLY • link 10.6 years ago by vivekdna • 0

Ram · Answer 4 · 2014-10-21

8

Entering edit mode

10.0 years ago

Hiroyuki Mishima ▴ 160

Yet another solution.

TogoWS's REST API now supports the UCSC Genome Database.

For example, http://togows.org/api/ucsc/hg38/chr1:12,345-12,500.fasta returns the reference genome sequence in the fasta format.

Please see further information at the "External API" section of http://togows.org/help/

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by Hiroyuki Mishima ▴ 160

Ram · Answer 5 · 2010-02-21

4

Entering edit mode

14.7 years ago

Istvan Albert 101k

The Genome Browser is for visualization.

To get data in many formats use the UCSC Table Browser then select the output format of your choice.

You may also need to select the right group and track to get the data you want.

ADD COMMENT • link 14.7 years ago by Istvan Albert 101k

0

Entering edit mode

I tought so, but I can't find the table for sequences, not even when I select 'All tracks' and 'AlLl tables'. Thanks anyway..

ADD REPLY • link 14.7 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

I guess you are right, now that I tried myself and it seems that information is awfully hard to get - sometimes the seemingly getting the easiest thing is not possible. You could try your luck with the Ensemble API.

ADD REPLY • link 14.7 years ago by Istvan Albert 101k

0

Entering edit mode

I guess you are right, now that I tried myself and it seems that information is awfully hard to get - sometimes getting something seemingly easy end up as being not possible. You could try your luck with the Ensemble API.

ADD REPLY • link 14.7 years ago by Istvan Albert 101k

0

Entering edit mode

I you need to download sequences, first you need to tell the website the coordinates that you want to download. (Remember that table browser is for batch processing). Click on "Custom track" and type in something like this:

chr1 1 1000
chr2 123 12345

Click Submit.

Now you can go to the table browser, select your "custom track" and select "Fasta format" as output format.

The advantage of this procedure is that you can request thousands of sequences in one go. No need to use an API. For scripting see my reply above that uses the linux command line tool "twoBitToFa" from UCSC.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.3 years ago by Maximilian Haeussler ★ 1.7k

0

Entering edit mode

I think, now UCSC browser has changed. Even I have the same query, but unable to fetch FASTA sequence using the coordinates. Can u explain using the given link.. https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=703655381_NWHqxkgVDaPPscUhaKxyA9bAbb7e

ADD REPLY • link 5.8 years ago by shuksi1984 ▴ 60

Ram · Answer 6 · 2015-11-19

Another solution is using Heng Li's sequence toolkit:

seqtk subseq

Usage:   seqtk subseq [options] <in.fa> <in.bed>|<name.list>

Options: -t       TAB delimited output
         -l INT   sequence line length [0]

Note: Use 'samtools faidx' if only a few regions are intended.

which can be downloaded from: https://github.com/lh3/seqtk, however, seqtk subseq does not consider the strand information, which has been implemented in bedtools getfasta

bedtools getfasta

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.23.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
        -fi     Input FASTA file
        -bed    BED/GFF/VCF file of ranges to extract from -fi
        -fo     Output file (can be FASTA or TAB-delimited)
        -name   Use the name field for the FASTA header
        -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
        -tab    Write output in TAB delimited format.
                - Default is FASTA format.

        -s      Force strandedness. If the feature occupies the antisense,
                strand, the sequence will be reverse complemented.
                - By default, strand information is ignored.

        -fullHeader     Use full fasta header.
                - By default, only the word before the first space or tab is used.

score 1 · Answer 7 · 2016-05-30

I understand this is a very old post, but if it helps anyone who is searching for programmatic access directly using UCSC tools, I found the below link http://genomewiki.ucsc.edu/index.php/Programmatic_access_to_the_Genome_Browser

It has details on

1) How to download data from their MySQL database

2) Get Chromosome sequence for a range (using REST API, which was what I was looking for)

... and few such things including accessing a copy of current Genome browser image

Hope this helps!