accession number: NM_001129809
Doing a nucleotide BLAST gives me the whole sequence, just wondering how to find the coding sequence after that.
accession number: NM_001129809
Doing a nucleotide BLAST gives me the whole sequence, just wondering how to find the coding sequence after that.
this should help you:
Locus NM_001129809 in Genbank is Strongylocentrotus purpuratus lefty (LOC577374), mRNA. 'mRNA' is an abbreviation for 'messenger RNA'. The Genbank entry is annotated with a CDS feature at bases 112..1311. 'CDS' is an abbreviation for 'coding sequence'.
Is the problem that you do not understand the concepts (mRNA, CDS), or that you didn't realise that you could look up NM_001129809 in Genbank, or something else?
You have almost answered your own question, so I'm not sure how to help!
Ok, here is an old tool, but maybe the most efficient I know:
Download queryWin client here (mac, Linux and windows supported).
Once launched, open the relevant database (refSeq RNA in your case).
Type ac=NM_001129809 in the search field.
select the result in the list content.
choose "extract seq to file" button -> extract feature region
choose "CDS" region
and that's it!
There are a wide range of ways of doing this and the choice depends largely on which software you have access to and which you are most comfortable with. However as Keith has pointed out you have to make sure you understand the terminology, and how the various biological constructs are represented in the databases of interest.
The identifier NM_001129809 is from RefSeq (RefSeq is not GenBank). RefSeq uses an extended version of the International Nucleotide Sequence Database Collaboration (INSDC) feature table specification (see "The DDBJ/EMBL/GenBank Feature Table") to describe the various features on the sequence.
The RefSeq nucleotide database in available in a wide range of on-line services, which provide different capabilities. For example:
In NCBI Entrez and SRS at EMBL-EBI you can information about a specific feature, including the sequence of the feature, by clicking on the feature key (e.g. 'CDS', 'gene', etc.). DAS clients commonly provide support for extracting a sequence for a feature too. Manu's answer describes the procedure for getting the CDS sequence when using ACNUC.
Given the entry data, there are also tools which can extract feature sequence, for example the EMBOSS suite includes the extractfeat program for this purpose.
If you want to do it programmatically, then libraries such as BioJava, BioPerl, BioPython and BioRuby include modules for performing this kind of operation. Alternatively web services (see "Introduction to Web Services") could be used to access or combine various web services (see BioCatalogue) to do this.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
@Manu sadly the link now gives a 404. Still ACNUC is a great solution for sequence manipulations on a database sequence.
I just tested: it works for me... I don't know any better tool to conduct (at least) this specific task.