Question: Refseq Mrna To Cds Sequence
1
gravatar for Woa
2.6 years ago by
Woa2.1k
United States
Woa2.1k wrote:

I've a long list of RefSeq mRNA Ids for a particular organism. I wish to download all the corresponding coding sequences(CDS) in fasta format, where available. Is their any suitable tool or script for automatically doing this?

Thanks in advance

WoA

ADD COMMENTlink written 2.6 years ago by Woa2.1k

Which organism?

ADD REPLYlink written 2.6 years ago by Neilfws41k

Mouse(Mus Musculus)

ADD REPLYlink written 2.6 years ago by Woa2.1k
4
gravatar for Pierre Lindenbaum
2.6 years ago by
France
Pierre Lindenbaum58k wrote:
  • Go to the table browser http://genome.ucsc.edu/cgi-bin/hgTables
  • select group "Gene", track "RefSeq", table "refGene"
  • click "identfiers: paste list" and copy+paste your list
  • output format: CDS fasta
  • get output
  • Formatting options: unselect everything but "Show nucleotides"
  • get output
ADD COMMENTlink written 2.6 years ago by Pierre Lindenbaum58k
3
gravatar for Neilfws
2.6 years ago by
Neilfws41k
Sydney, Australia
Neilfws41k wrote:

Normally I would suggest BioMart for this purpose (assuming that your organism is in BioMart) but as I write, it is giving an error. However, here's the procedure for when they fix it:

  1. Select MARTVIEW in the top menu
  2. Choose database Ensembl genes 64, select dataset for your organism
  3. Click Filters, left menu; expand "Gene"; check "ID list limit"; select "Refseq mRNA IDs"
  4. Paste or upload IDs
  5. Click Attributes, left menu; select "Sequences"; expand "SEQUENCES"; select "Coding sequence"
  6. Click "Results", top-left menu.

Currently, this gives the error "Serious Error: Error during query execution: Table 'ensembl_mart_64.ox_RefSeq_mRNA__dm' doesn't exist" - I will report this to BioMart.

ADD COMMENTlink written 2.6 years ago by Neilfws41k

Message from Ensembl: "This is a known bug in BioMart for release 64. See the known bugs page here: http://www.ensembl.info/contact-us/known-bugs/. This bug will be fixed for release 65 due out in November."

ADD REPLYlink written 2.6 years ago by Neilfws41k
1
gravatar for brentp
2.6 years ago by
brentp17k
Denver, Colorado
brentp17k wrote:

If you're willing to try an in-development library, you can try cruzdb. With a script like this:

from cruzdb import Genome
db = Genome('hg19')

refGene = db.refGene

for name in (n.strip() for n in open("names.txt")):
    gene = refGene.filter_by(name=name).one()
    print ">%s" % name
    print "".join(gene.cds_sequence)

and names.txt containing id's like: NM_001127388 NM_001127389

It will create print FASTA file by querying the UCSC genomes database (refGene table), and grabbing sequence from their DAS sequence server.

If you have a long list, see the notes on the cruzdb page about mirroring the MySQL pages locally.

ADD COMMENTlink written 2.6 years ago by brentp17k

Thanks I'll try that

ADD REPLYlink written 2.6 years ago by Woa2.1k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 335 users visited in the last hour