Question

Riken Identifiers: Conversion To Fastas

1

Entering edit mode

12.0 years ago

Anima Mundi ★ 2.9k

Hello, I have a list of RIKEN IDs; I would like to convert them to FASTAs. The DAVID project does not list RIKEN identifiers as accepted input. What would you suggest?

microarray gene expression • 4.3k views

ADD COMMENT • link updated 12.0 years ago by Neilfws 49k • written 12.0 years ago by Anima Mundi ★ 2.9k

1

Entering edit mode

could you please provide a list of "RIKENS IDS"

ADD REPLY • link 12.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Yes, the IDs I have look like this: 1100001G20Rik, 2610301B20Rik, 4930579J09Rik.

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

OK; so you have MGI symbols, not RIKEN clone IDs.

ADD REPLY • link 12.0 years ago by Neilfws 49k

0

Entering edit mode

Sorry for the mess guys.

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

And I assume that by "convert to FASTAs", you mean "sequences in FASTA format."

ADD REPLY • link 12.0 years ago by Neilfws 49k

0

Entering edit mode

Yes, I mean sequences wich are in FASTA format.

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k

score 2 · Answer 1 · 2012-04-11

2

Entering edit mode

12.0 years ago

Neilfws 49k

My first thought is that most (all?) RIKEN clones are in the NCBI nucleotide database. So given a clone ID such as AK080584, you could go via EUtils esearch/efetch, or use a remote database sequence retrieval utility such as Bioperl's bp_fetch:

bp_fetch net::genbank:AK080584

EDIT

My second thought, now that we've established that you have MGI symbols, not RIKEN IDs, is - you can do this using BioMart. Choose Mus musculus genes as your dataset and under Filters, you'll see MGI symbol as an option. Select sequence retrieval options under Attributes. Search this site for numerous explanations of how to use BioMart if required.

ADD COMMENT • link 12.0 years ago by Neilfws 49k

0

Entering edit mode

Your example works fine, but unfortunately my IDs look quite different. If I use bp-fetch on one (or even all) of them i get this error: "Sequence 1810008A18Rik in Database genbank in net::genbank:1810008A18Rik is not loadable. Skipping". So maybe they are not really RIKEN identifiers (as thought, in this case I apologize for the improper question), or else not all the RIKEN IDs are included in the NCBI nucleotide database. Of course, if the first option is the true one, I will surely choose your answer as the accepted one :).

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

You will find 1810008A18Rik if you search at the NCBI website, so my esearch + efetch suggestion should work.

ADD REPLY • link 12.0 years ago by Neilfws 49k

0

Entering edit mode

Now it is all clear. I mark this as the accepted answer then.

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k

score 1 · Answer 2 · 2012-04-11

1

Entering edit mode

12.0 years ago

Pierre Lindenbaum 161k

use http://www.informatics.jax.org/batch to get a list of segments: chromosome/start/end

then use the UCSC DAS server to download each segment: see http://www.biostars.org/post/show/56/how-to-get-the-sequence-of-a-genomic-region-from-ucsc/

ADD COMMENT • link 12.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you for the aid, Pierre. Unfortunately the service is under maintenance now, I will test your solution soon.

ADD REPLY • link 12.0 years ago by Anima Mundi ★ 2.9k