Question

How to get .faa files from a lot of genome from ncbi

0

Entering edit mode

9.5 years ago

CrLs ▴ 10

Hi everyone,

I'd like to download the faa file from a lot of genome (more than 500) from ncbi. I used the batch Entrez tool but it allow you to download the file only in gb, fna, asn ..format but not in .faa

I could use a python script like this one :

from Bio import Entrez, SeqIO
Entrez.email = "email@email.com"
for I in 'XX00001',XX000002':
    handle = Entrez.efetch(db="nucleotide", id=i, rettype="gb")
    Fasta = open(i + '.fasta', 'a')
    for seq_record in SeqIO.parse(handle, "gb"):
        for feature in seq_record.features:
            if feature.type == "CDS":
                if 'translation' in feature.qualifiers:
                    CDS_seq = feature.qualifiers['translation'][0]
                    protid = feature.qualifiers['protein_id'][0]
                    Fasta.write(">" + protid + "\n" + str(CDS_seq) + "\n")
    Fasta.close()

But I'd like to know if there is no tools provided by the ncbi to do that.

Regards,
Charles

genome • 6.2k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by CrLs ▴ 10

0

Entering edit mode

FNA and FAA files are different only in extension. The content is the same. You can just download the FNAs and bulk rename them to *.faa

ADD REPLY • link 9.5 years ago by Ram 43k

0

Entering edit mode

hello,

Fna are fasta nucleic acid and faa contains amino acids. I want a .mpfa from a genome. If i turn my FNA files in FAA it will still be a nucleic sequence.

ADD REPLY • link 9.5 years ago by CrLs ▴ 10

1

Entering edit mode

I apologize. From your question, I assumed that you wanted the genome sequence for multiple genus. I guess you want the proteome (or a subset of it).

ADD REPLY • link 9.5 years ago by Ram 43k

0

Entering edit mode

No need to apologize. I asked for the 'genome'. I want indeed the proteome.

ADD REPLY • link 9.5 years ago by CrLs ▴ 10

0

Entering edit mode

fna is suppose to stand for nucleic acid and faa for amino acid, thus fna files and faa cannot be the same.

ADD REPLY • link 9.5 years ago by Manu Prestat 4.1k

0

Entering edit mode

That's right + change the extension won't change the sequence.

ADD REPLY • link 9.5 years ago by CrLs ▴ 10

Ram · Answer 1 · 2014-11-03

Well, I discovered the 'fasta_cds_aa' rettype (i was'nt aware of it) so I write this (python script). It seems to work.

Feel free to use.

from Bio import Entrez
Entrez.email = "email@email.com"
ident = 'XX00001','XX000002'
for I in ident :
    handle = Entrez.efetch(db="nuccore", id=i, rettype="fasta_cds_aa")
    Fasta = open(i + '.fasta', 'a')
    text = handle.read()
    Fasta.write(text)
    Fasta.close()

Regards,
Charles