Querying the UCSC Genome Browser for DNA sequences (large collection of data)
2
0
Entering edit mode
9.6 years ago
mshumph2 • 0

I have a spreadsheet with 500 different positions on different chromosomes, and I'd like to pull out the DNA sequences between those positions. The spreadsheet is already set up in a way that could easily be related to the UCSC Genome Browser database if only I had a way to either upload my spreadsheet to the database or download the necessary tables. It seems like there must be a table that relates the position on the chromosome to a specific nucleotide, so I feel like if I found that table I could do this. So my question is, does anyone know of a way to do this? Is there an easier way to do this?

I tried connecting remotely to UCSC's MySQL server so that I could access the tables through MS Access, but I couldn't connect to it. I'm also somewhat familiar with Biopython if there's an easier way to do this using another database like NCBI's Nucleotide database.

Thanks

sequence • 2.9k views
ADD COMMENT
1
Entering edit mode

If the sequences are all from the same genome I would recommend downloading the 2bit file for the genome and using a command line package like twoBitToFa.

For hg19, download this file (778 MB) and access it with this linux software.

If you'd prefer to do it in R, check out the BSgenome and DNAstrings packages from Bioconductor.

-Micah

ADD REPLY
0
Entering edit mode
9.6 years ago

This is from the UCSC FAQ:

Download the appropriate fasta files from our ftp server and extract sequence data using your own tools or the tools from our source tree. This is the recommended method when you have very large sequence datasets or will be extracting data frequently.

Which is effectively what micahgearhart suggests. To get sequences from coordinates you could use getfasta in bedtools.

ADD COMMENT
0
Entering edit mode
9.6 years ago

Have you tried to use Galaxy?

Step 1: Upload your coordinates in proper format (bed, gff...) with "Get data", and "upload file"

Step 2: Use the tool "Extract Genomic DNA" in the "Fetch sequences" category

ADD COMMENT

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6