Question: How Can I Determine The Coding Sequence Of A Gene, Given The Genbank Accession Number?
gravatar for Arman
7.2 years ago by
Arman0 wrote:

accession number: NM_001129809

Doing a nucleotide BLAST gives me the whole sequence, just wondering how to find the coding sequence after that.

cds genbank data • 12k views
ADD COMMENTlink modified 7.2 years ago by Hamish3.1k • written 7.2 years ago by Arman0
gravatar for Gjain
7.2 years ago by
Göttingen, Germany
Gjain5.3k wrote:

this should help you:

  1. Beginning Perl for Bioinformatics (chapter 10: GenBank)
  2. Biopython Tutorial and Cookbook (Parsing GenBank)
  3. Sample GenBank Record
ADD COMMENTlink written 7.2 years ago by Gjain5.3k
gravatar for iw9oel_ad
7.2 years ago by
iw9oel_ad6.0k wrote:

Locus NM_001129809 in Genbank is Strongylocentrotus purpuratus lefty (LOC577374), mRNA. 'mRNA' is an abbreviation for 'messenger RNA'. The Genbank entry is annotated with a CDS feature at bases 112..1311. 'CDS' is an abbreviation for 'coding sequence'.

Is the problem that you do not understand the concepts (mRNA, CDS), or that you didn't realise that you could look up NM_001129809 in Genbank, or something else?

You have almost answered your own question, so I'm not sure how to help!

ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by iw9oel_ad6.0k
gravatar for Manu Prestat
7.2 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

Ok, here is an old tool, but maybe the most efficient I know:

  1. Download queryWin client here (mac, Linux and windows supported).

  2. Once launched, open the relevant database (refSeq RNA in your case).

  3. Type ac=NM_001129809 in the search field.

  4. select the result in the list content.

  5. choose "extract seq to file" button -> extract feature region

  6. choose "CDS" region

and that's it!

ADD COMMENTlink written 7.2 years ago by Manu Prestat3.9k

@Manu sadly the link now gives a 404. Still ACNUC ( is a great solution for sequence manipulations on a database sequence.

ADD REPLYlink written 7.0 years ago by Hamish3.1k

I just tested: it works for me... I don't know any better tool to conduct (at least) this specific task.

ADD REPLYlink written 7.0 years ago by Manu Prestat3.9k
gravatar for Hamish
7.0 years ago by
Hamish3.1k wrote:

There are a wide range of ways of doing this and the choice depends largely on which software you have access to and which you are most comfortable with. However as Keith has pointed out you have to make sure you understand the terminology, and how the various biological constructs are represented in the databases of interest.

The identifier NM_001129809 is from RefSeq (RefSeq is not GenBank). RefSeq uses an extended version of the International Nucleotide Sequence Database Collaboration (INSDC) feature table specification (see "The DDBJ/EMBL/GenBank Feature Table") to describe the various features on the sequence.

The RefSeq nucleotide database in available in a wide range of on-line services, which provide different capabilities. For example:

In NCBI Entrez and SRS at EMBL-EBI you can information about a specific feature, including the sequence of the feature, by clicking on the feature key (e.g. 'CDS', 'gene', etc.). DAS clients commonly provide support for extracting a sequence for a feature too. Manu's answer describes the procedure for getting the CDS sequence when using ACNUC.

Given the entry data, there are also tools which can extract feature sequence, for example the EMBOSS suite includes the extractfeat program for this purpose.

If you want to do it programmatically, then libraries such as BioJava, BioPerl, BioPython and BioRuby include modules for performing this kind of operation. Alternatively web services (see "Introduction to Web Services") could be used to access or combine various web services (see BioCatalogue) to do this.

ADD COMMENTlink written 7.0 years ago by Hamish3.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 842 users visited in the last hour