Question

Refseq Mrna To Cds Sequence

1

Entering edit mode

13.8 years ago

Woa ★ 2.9k

I've a long list of RefSeq mRNA Ids for a particular organism. I wish to download all the corresponding coding sequences(CDS) in fasta format, where available. Is their any suitable tool or script for automatically doing this?

Thanks in advance

WoA

refseq cds • 8.3k views

ADD COMMENT • link updated 13.8 years ago by Neilfws 49k • written 13.8 years ago by Woa ★ 2.9k

0

Entering edit mode

Which organism?

ADD REPLY • link 13.8 years ago by Neilfws 49k

0

Entering edit mode

Mouse(Mus Musculus)

ADD REPLY • link 13.8 years ago by Woa ★ 2.9k

score 4 · Answer 1 · 2011-10-03

4

Entering edit mode

13.8 years ago

Pierre Lindenbaum 166k

Go to the table browser http://genome.ucsc.edu/cgi-bin/hgTables
select group "Gene", track "RefSeq", table "refGene"
click "identfiers: paste list" and copy+paste your list
output format: CDS fasta
get output
Formatting options: unselect everything but "Show nucleotides"
get output

ADD COMMENT • link 13.8 years ago by Pierre Lindenbaum 166k

Ram · Answer 2 · 2011-10-04

3

Entering edit mode

13.8 years ago

Neilfws 49k

Normally I would suggest BioMart for this purpose (assuming that your organism is in BioMart) but as I write, it is giving an error. However, here's the procedure for when they fix it:

Select MARTVIEW in the top menu
Choose database Ensembl genes 64, select dataset for your organism
Click Filters, left menu; expand "Gene"; check "ID list limit"; select "Refseq mRNA IDs"
Paste or upload IDs
Click Attributes, left menu; select "Sequences"; expand "SEQUENCES"; select "Coding sequence"
Click "Results", top-left menu.

Currently, this gives the error "Serious Error: Error during query execution: Table 'ensembl_mart_64.ox_RefSeq_mRNA__dm' doesn't exist" - I will report this to BioMart.

ADD COMMENT • link 13.8 years ago by Neilfws 49k

0

Entering edit mode

Message from Ensembl: "This is a known bug in BioMart for release 64. See the known bugs page here: http://www.ensembl.info/contact-us/known-bugs/. This bug will be fixed for release 65 due out in November."

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 13.8 years ago by Neilfws 49k

score 1 · Answer 3 · 2011-10-03

If you're willing to try an in-development library, you can try cruzdb. With a script like this:

from cruzdb import Genome
db = Genome('hg19')

refGene = db.refGene

for name in (n.strip() for n in open("names.txt")):
    gene = refGene.filter_by(name=name).one()
    print ">%s" % name
    print "".join(gene.cds_sequence)

and names.txt containing id's like: NM_001127388 NM_001127389

It will create print FASTA file by querying the UCSC genomes database (refGene table), and grabbing sequence from their DAS sequence server.

If you have a long list, see the notes on the cruzdb page about mirroring the MySQL pages locally.