Download all hg19 coding sequences from UCSC
1
1
Entering edit mode
10.6 years ago
Daniel ▴ 40

How to download all human coding sequences from UCSC table browser. The resulting format that we want to send to Galaxy is "gene ID, CDS in fasta".

human-genome galaxy cds • 4.2k views
ADD COMMENT
3
Entering edit mode
10.6 years ago
Dan D 7.4k

on the bash command line (assuming you have a mysql client installed):

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
-N -e 'SELECT * FROM knownGeneMrna' | sed -e 's/^/>/' -e 's/\s/\n/' > myFastaFile.fa

This will take a while to run. Make sure the output is what you want by sticking a LIMIT 10 in your SQL query:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
-N -e 'SELECT * FROM knownGeneMrna LIMIT 10' | sed -e 's/^/>/' -e 's/\s/\n/' > myFastaTestFile.fa
ADD COMMENT
0
Entering edit mode

Thanks. Any way to do it through the table browser web interface?

ADD REPLY
0
Entering edit mode

If you're willing to use the Ensembl annotation then you can just use Biomart.

ADD REPLY
0
Entering edit mode

Doesn't this just download knownGeneMrna? These aren't just CDSs but include UTRs. Is there a way to get the CDS?

ADD REPLY
0
Entering edit mode

There's a "coding sequence" option.

ADD REPLY

Login before adding your answer.

Traffic: 2056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6