Question: Refseq Mrna To Cds Sequence
gravatar for Woa
3.7 years ago by
United States
Woa2.3k wrote:

I've a long list of RefSeq mRNA Ids for a particular organism. I wish to download all the corresponding coding sequences(CDS) in fasta format, where available. Is their any suitable tool or script for automatically doing this?

Thanks in advance


ADD COMMENTlink written 3.7 years ago by Woa2.3k

Which organism?

ADD REPLYlink written 3.7 years ago by Neilfws44k

Mouse(Mus Musculus)

ADD REPLYlink written 3.7 years ago by Woa2.3k
gravatar for Pierre Lindenbaum
3.7 years ago by
Pierre Lindenbaum71k wrote:
  • Go to the table browser
  • select group "Gene", track "RefSeq", table "refGene"
  • click "identfiers: paste list" and copy+paste your list
  • output format: CDS fasta
  • get output
  • Formatting options: unselect everything but "Show nucleotides"
  • get output
ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum71k
gravatar for Neilfws
3.7 years ago by
Sydney, Australia
Neilfws44k wrote:

Normally I would suggest BioMart for this purpose (assuming that your organism is in BioMart) but as I write, it is giving an error. However, here's the procedure for when they fix it:

  1. Select MARTVIEW in the top menu
  2. Choose database Ensembl genes 64, select dataset for your organism
  3. Click Filters, left menu; expand "Gene"; check "ID list limit"; select "Refseq mRNA IDs"
  4. Paste or upload IDs
  5. Click Attributes, left menu; select "Sequences"; expand "SEQUENCES"; select "Coding sequence"
  6. Click "Results", top-left menu.

Currently, this gives the error "Serious Error: Error during query execution: Table 'ensembl_mart_64.ox_RefSeq_mRNA__dm' doesn't exist" - I will report this to BioMart.

ADD COMMENTlink written 3.7 years ago by Neilfws44k

Message from Ensembl: "This is a known bug in BioMart for release 64. See the known bugs page here: This bug will be fixed for release 65 due out in November."

ADD REPLYlink written 3.6 years ago by Neilfws44k
gravatar for brentp
3.7 years ago by
Salt Lake City, UT
brentp19k wrote:

If you're willing to try an in-development library, you can try cruzdb. With a script like this:

from cruzdb import Genome
db = Genome('hg19')

refGene = db.refGene

for name in (n.strip() for n in open("names.txt")):
    gene = refGene.filter_by(name=name).one()
    print ">%s" % name
    print "".join(gene.cds_sequence)

and names.txt containing id's like: NM_001127388 NM_001127389

It will create print FASTA file by querying the UCSC genomes database (refGene table), and grabbing sequence from their DAS sequence server.

If you have a long list, see the notes on the cruzdb page about mirroring the MySQL pages locally.

ADD COMMENTlink written 3.7 years ago by brentp19k

Thanks I'll try that

ADD REPLYlink written 3.7 years ago by Woa2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 661 users visited in the last hour