Where do I get gene sequences?
1
0
Entering edit mode
9.0 years ago

It's very easy to get constructs that people published from GenBank, for instance. Type an identifying term, or better yet, the actual alphanumeric ID, and you're set! You can copy it by hand, or load it in a fancy environment like Biopython, etc.

I would very much like to do the same with genes. Let's say I want to get the sequence of the Mus Musculus GADPH. Genbank of course, will give me either nothing, or, if I am lucky, a heap of entries containing features labeled GADPH or Mus or Musculus, etc. Of, course, I can use this to work around the issue, and use Biopython to extract the feature from whatever constructs I get. But this is needlessly computationally intensive, and introduces variability where there should be none...

NCBI has a nice interface called "genes"... maybe this is better? Maybe indeed: looking "mus musculus GADPH" up there gives me this nice page: http://www.ncbi.nlm.nih.gov/gene/14433#reference-sequences. Colorful and whatnot, but lacking the actual sequence. It has a couple of links above the pretty box in the bottom, which could lead you to believe they'll take you straight to a fasta or genbank file for GADPH. But they do not, the Genbank link on the GADPH page, for instance, takes me here. There is a link to an Ensembl entry, which gives you positional data for GADPH on the genome. This is indeed something, except this information is staggeringly difficult to access through biopython, and I was actually looking for a sequence, not for positional info.

So, can you help me out? how do I (simply, and script-compatibly) get gene sequences?

genes genome sequence • 1.7k views
ADD COMMENT
0
Entering edit mode
9.0 years ago

You might prefer to use Biomart. There are APIs available for perl and R (there's also a python package, but it doesn't seem to be fully functional). Here's an example result from the web interface,

ADD COMMENT

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6