Entering edit mode
7.4 years ago
guedes.aureliano
•
0
I have a list with prokaryotic proteins ACNs.
Now I need to download a fasta containing the coding sequences of this proteins.
I've tried efetch query for ipg format, after efetch with de genome ID with seq_start and seq_stop parameters, but the retrieve sequence isn't equal protein sequence when I translate it.
Maybe I'm wrong, but I'm working with streptococcus and I think that they have no introns in their genic sequences.
So, how could I do this job?
Are the accessions from different bacteria or just one?
If they are from one then you could find the genome you are working with at NCBI's ftp site. Then get the *cds_from_genomic.fna.gz file for your genome. Here is one random example.
The list is 68 protein IDs from different species of Streptococcus.
Please use
ADD REPLY/ADD COMMENT
when responding to existing posts/providing information.If they are not associated with existing genomes/nucleotide records then you may have some trouble. Can you post a couple of examples?
No, all they are associated with existing genome. Example: YP_002996579.1 I try
This return a list were each line is like this
But maybe this is not the best way to retrieve the CDS.
Reason
YP_002996579
example is not working as expected because that accession number appears to have been discontinued. See the note on its NCBI record. FASTA sequence for that record is available in that NCBI entry, which corresponds to what you have retrieved in the query.