Riken Identifiers: Conversion To Fastas
2
1
Entering edit mode
12.0 years ago
Anima Mundi ★ 2.9k

Hello, I have a list of RIKEN IDs; I would like to convert them to FASTAs. The DAVID project does not list RIKEN identifiers as accepted input. What would you suggest?

microarray gene expression • 4.3k views
ADD COMMENT
1
Entering edit mode

could you please provide a list of "RIKENS IDS"

ADD REPLY
0
Entering edit mode

Yes, the IDs I have look like this: 1100001G20Rik, 2610301B20Rik, 4930579J09Rik.

ADD REPLY
0
Entering edit mode

OK; so you have MGI symbols, not RIKEN clone IDs.

ADD REPLY
0
Entering edit mode

Sorry for the mess guys.

ADD REPLY
0
Entering edit mode

And I assume that by "convert to FASTAs", you mean "sequences in FASTA format."

ADD REPLY
0
Entering edit mode

Yes, I mean sequences wich are in FASTA format.

ADD REPLY
2
Entering edit mode
12.0 years ago
Neilfws 49k

My first thought is that most (all?) RIKEN clones are in the NCBI nucleotide database. So given a clone ID such as AK080584, you could go via EUtils esearch/efetch, or use a remote database sequence retrieval utility such as Bioperl's bp_fetch:

bp_fetch net::genbank:AK080584

EDIT

My second thought, now that we've established that you have MGI symbols, not RIKEN IDs, is - you can do this using BioMart. Choose Mus musculus genes as your dataset and under Filters, you'll see MGI symbol as an option. Select sequence retrieval options under Attributes. Search this site for numerous explanations of how to use BioMart if required.

ADD COMMENT
0
Entering edit mode

Your example works fine, but unfortunately my IDs look quite different. If I use bp-fetch on one (or even all) of them i get this error: "Sequence 1810008A18Rik in Database genbank in net::genbank:1810008A18Rik is not loadable. Skipping". So maybe they are not really RIKEN identifiers (as thought, in this case I apologize for the improper question), or else not all the RIKEN IDs are included in the NCBI nucleotide database. Of course, if the first option is the true one, I will surely choose your answer as the accepted one :).

ADD REPLY
0
Entering edit mode

You will find 1810008A18Rik if you search at the NCBI website, so my esearch + efetch suggestion should work.

ADD REPLY
0
Entering edit mode

Now it is all clear. I mark this as the accepted answer then.

ADD REPLY
1
Entering edit mode
12.0 years ago

use http://www.informatics.jax.org/batch to get a list of segments: chromosome/start/end

then use the UCSC DAS server to download each segment: see http://www.biostars.org/post/show/56/how-to-get-the-sequence-of-a-genomic-region-from-ucsc/

ADD COMMENT
0
Entering edit mode

Thank you for the aid, Pierre. Unfortunately the service is under maintenance now, I will test your solution soon.

ADD REPLY

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6