Question: How To Retrive The Dna Sequence From A List Of Embl And Geneid
gravatar for Kirsley
9.2 years ago by
Kirsley50 wrote:

Hi everyone,

As input files, I use swissprot files. I have a perl script which parses all the enter code herefiles to retieve all the EMBL ids and GeneID from the DR features line for each protein. I would like to know if there's an automatic way to retrieve all the corresponding DNA squences for each protein on the list. Thanks for your help.



protein uniprot genbank dna • 3.4k views
ADD COMMENTlink modified 7.9 years ago • written 9.2 years ago by Kirsley50
gravatar for Pierre Lindenbaum
9.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

from a geneid you can get the information as XML from the NCBI with EFetch. e.g. for GeneId=2.

and you can then get the accession of each RNA sequence under: (...)/Gene-commentary_products/Gene-commentary/Gene-commentary_type[@value='mRNA']/Gene-commentary_accession (use XSLT/XPATH to extract this information)

and for each accession you get the DNA sequence with EFetch.

ADD COMMENTlink written 9.2 years ago by Pierre Lindenbaum120k
gravatar for Neilfws
9.2 years ago by
Sydney, Australia
Neilfws48k wrote:

Yes there is if your organism is in Ensembl - BioMart. Here is how you'd use the web interface, assuming that you want human sequences:

  1. Go to BioMart and click MARTVIEW
  2. Select database = Ensembl Genes 57
  3. Select dataset = Homo sapiens genes
  4. Click "Filters" to the left and open the "Gene" selection
  5. From the dropdown box, select the IDs that you want to use (e.g. UniProt/TrEMBL)
  6. Either paste your list in the box or upload the file
  7. Click "Attributes" to the left, select the "Sequences" radio button and open the "Sequences" tab
  8. Select what type of sequence (e.g. unspliced transcript)
  9. Click "Results" (in menu bar, top-left of page)

This will return the first 10 sequences. You can download the rest as a file. There is also programmatic access to Ensembl: Perl API, biomaRt for R Bioconductor.

If this doesn't work for you, the Bioperl library should be able to retrieve sequences given IDs.

ADD COMMENTlink written 9.2 years ago by Neilfws48k
gravatar for Kirsley
9.2 years ago by
Kirsley50 wrote:

Thanks for your answers! I could not try with BioMart as my organism is not present on Ensembl. I forgot to precise that I am working on Chlamydiales.

ADD COMMENTlink written 9.2 years ago by Kirsley50

In that case, Pierre's solution is the best. Since you use Perl, you might like to look at the Bioperl EUtils Cookbook -

ADD REPLYlink written 9.2 years ago by Neilfws48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2191 users visited in the last hour