Entering edit mode
10.5 years ago
CrLs
▴
10
Hi everyone,
I'd like to download the faa file from a lot of genome (more than 500) from ncbi. I used the batch Entrez tool but it allow you to download the file only in gb, fna, asn ..format but not in .faa
I could use a python script like this one :
from Bio import Entrez, SeqIO
Entrez.email = "email@email.com"
for I in 'XX00001',XX000002':
handle = Entrez.efetch(db="nucleotide", id=i, rettype="gb")
Fasta = open(i + '.fasta', 'a')
for seq_record in SeqIO.parse(handle, "gb"):
for feature in seq_record.features:
if feature.type == "CDS":
if 'translation' in feature.qualifiers:
CDS_seq = feature.qualifiers['translation'][0]
protid = feature.qualifiers['protein_id'][0]
Fasta.write(">" + protid + "\n" + str(CDS_seq) + "\n")
Fasta.close()
But I'd like to know if there is no tools provided by the ncbi to do that.
Regards,
Charles
FNA and FAA files are different only in extension. The content is the same. You can just download the FNAs and bulk rename them to *.faa
hello,
Fna are fasta nucleic acid and faa contains amino acids. I want a .mpfa from a genome. If i turn my FNA files in FAA it will still be a nucleic sequence.
I apologize. From your question, I assumed that you wanted the genome sequence for multiple genus. I guess you want the proteome (or a subset of it).
No need to apologize. I asked for the 'genome'. I want indeed the proteome.
fna is suppose to stand for nucleic acid and faa for amino acid, thus fna files and faa cannot be the same.
That's right + change the extension won't change the sequence.