How To Retrive The Dna Sequence From A List Of Embl And Geneid
3
3
Entering edit mode
14.1 years ago
Kirsley ▴ 50

Hi everyone,

As input files, I use swissprot files.

I have a perl script which parses all the files to retrieve all the EMBL ids and GeneID from the DR features line for each protein. I would like to know if there's an automatic way to retrieve all the corresponding DNA squences for each protein on the list. Thanks for your help.

Best,
Kirsley

genbank dna protein uniprot • 4.7k views
ADD COMMENT
5
Entering edit mode
14.1 years ago

From a geneid you can get the information as XML from the NCBI with EFetch. e.g. for GeneId=2.

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&retmode=xml

and you can then get the accession of each RNA sequence under: (...)/Gene-commentary_products/Gene-commentary/Gene-commentary_type[@value='mRNA']/Gene-commentary_accession (use XSLT/XPATH to extract this information)

and for each accession you get the DNA sequence with EFetch.

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NM_000014&rettype=fasta&retmode=xml

ADD COMMENT
4
Entering edit mode
14.1 years ago
Neilfws 49k

Yes there is if your organism is in Ensembl - BioMart. Here is how you'd use the web interface, assuming that you want human sequences:

  1. Go to BioMart and click MARTVIEW
  2. Select database = Ensembl Genes 57
  3. Select dataset = Homo sapiens genes
  4. Click "Filters" to the left and open the "Gene" selection
  5. From the dropdown box, select the IDs that you want to use (e.g. UniProt/TrEMBL)
  6. Either paste your list in the box or upload the file
  7. Click "Attributes" to the left, select the "Sequences" radio button and open the "Sequences" tab
  8. Select what type of sequence (e.g. unspliced transcript)
  9. Click "Results" (in menu bar, top-left of page)

This will return the first 10 sequences. You can download the rest as a file. There is also programmatic access to Ensembl: Perl API, biomaRt for R Bioconductor.

If this doesn't work for you, the Bioperl library should be able to retrieve sequences given IDs.

ADD COMMENT
0
Entering edit mode
14.1 years ago
Kirsley ▴ 50

Thanks for your answers! I could not try with BioMart as my organism is not present on Ensembl. I forgot to precise that I am working on Chlamydiales.

ADD COMMENT
0
Entering edit mode

In that case, Pierre's solution is the best. Since you use Perl, you might like to look at the Bioperl EUtils Cookbook.

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6