Retrieving fasta with dna sequences given prokaryotic protein accession numbers
0
0
Entering edit mode
7.6 years ago

I have a list with prokaryotic proteins ACNs.

Now I need to download a fasta containing the coding sequences of this proteins.

I've tried efetch query for ipg format, after efetch with de genome ID with seq_start and seq_stop parameters, but the retrieve sequence isn't equal protein sequence when I translate it.

Maybe I'm wrong, but I'm working with streptococcus and I think that they have no introns in their genic sequences.

So, how could I do this job?

gene sequence retrieving • 1.9k views
ADD COMMENT
0
Entering edit mode

Are the accessions from different bacteria or just one?

If they are from one then you could find the genome you are working with at NCBI's ftp site. Then get the *cds_from_genomic.fna.gz file for your genome. Here is one random example.

ADD REPLY
0
Entering edit mode

The list is 68 protein IDs from different species of Streptococcus.

ADD REPLY
0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts/providing information.

If they are not associated with existing genomes/nucleotide records then you may have some trouble. Can you post a couple of examples?

ADD REPLY
0
Entering edit mode

No, all they are associated with existing genome. Example: YP_002996579.1 I try

efetch -id YP_002996579.1 -db protein -format ipg

This return a list were each line is like this

RefSeq  NC_012891.1             854416  855570  +       WP_003058145.1  spermidine/putrescine import ATP-binding protein PotA [Streptococcus dysgalactiae]                      Streptococcus dysgalactiae subsp. equisimilis GGS_124           GGS_124                 Bacteria

But maybe this is not the best way to retrieve the CDS.

ADD REPLY
0
Entering edit mode

Reason YP_002996579 example is not working as expected because that accession number appears to have been discontinued. See the note on its NCBI record. FASTA sequence for that record is available in that NCBI entry, which corresponds to what you have retrieved in the query.

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6