Question: Retrieving fasta with dna sequences given prokaryotic protein accession numbers
0
gravatar for guedes.aureliano
3.7 years ago by
guedes.aureliano0 wrote:

I have a list with prokaryotic proteins ACNs.

Now I need to download a fasta containing the coding sequences of this proteins.

I've tried efetch query for ipg format, after efetch with de genome ID with seq_start and seq_stop parameters, but the retrieve sequence isn't equal protein sequence when I translate it.

Maybe I'm wrong, but I'm working with streptococcus and I think that they have no introns in their genic sequences.

So, how could I do this job?

sequence retrieving gene • 1.2k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by guedes.aureliano0

Are the accessions from different bacteria or just one?

If they are from one then you could find the genome you are working with at NCBI's ftp site. Then get the *cds_from_genomic.fna.gz file for your genome. Here is one random example.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by genomax86k

The list is 68 protein IDs from different species of Streptococcus.

ADD REPLYlink written 3.7 years ago by guedes.aureliano0

Please use ADD REPLY/ADD COMMENT when responding to existing posts/providing information.

If they are not associated with existing genomes/nucleotide records then you may have some trouble. Can you post a couple of examples?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by genomax86k

No, all they are associated with existing genome. Example: YP_002996579.1 I try

efetch -id YP_002996579.1 -db protein -format ipg

This return a list were each line is like this

RefSeq  NC_012891.1             854416  855570  +       WP_003058145.1  spermidine/putrescine import ATP-binding protein PotA [Streptococcus dysgalactiae]                      Streptococcus dysgalactiae subsp. equisimilis GGS_124           GGS_124                 Bacteria

But maybe this is not the best way to retrieve the CDS.

ADD REPLYlink written 3.7 years ago by guedes.aureliano0

Reason YP_002996579 example is not working as expected because that accession number appears to have been discontinued. See the note on its NCBI record. FASTA sequence for that record is available in that NCBI entry, which corresponds to what you have retrieved in the query.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by genomax86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour