How to get .faa files from a lot of genome from ncbi
1
0
Entering edit mode
9.5 years ago
CrLs ▴ 10

Hi everyone,

I'd like to download the faa file from a lot of genome (more than 500) from ncbi. I used the batch Entrez tool but it allow you to download the file only in gb, fna, asn ..format but not in .faa

I could use a python script like this one :

from Bio import Entrez, SeqIO
Entrez.email = "email@email.com"
for I in 'XX00001',XX000002':
    handle = Entrez.efetch(db="nucleotide", id=i, rettype="gb")
    Fasta = open(i + '.fasta', 'a')
    for seq_record in SeqIO.parse(handle, "gb"):
        for feature in seq_record.features:
            if feature.type == "CDS":
                if 'translation' in feature.qualifiers:
                    CDS_seq = feature.qualifiers['translation'][0]
                    protid = feature.qualifiers['protein_id'][0]
                    Fasta.write(">" + protid + "\n" + str(CDS_seq) + "\n")
    Fasta.close()

But I'd like to know if there is no tools provided by the ncbi to do that.

Regards,
Charles

genome • 6.2k views
ADD COMMENT
0
Entering edit mode

FNA and FAA files are different only in extension. The content is the same. You can just download the FNAs and bulk rename them to *.faa

ADD REPLY
0
Entering edit mode

hello,

Fna are fasta nucleic acid and faa contains amino acids. I want a .mpfa from a genome. If i turn my FNA files in FAA it will still be a nucleic sequence.

ADD REPLY
1
Entering edit mode

I apologize. From your question, I assumed that you wanted the genome sequence for multiple genus. I guess you want the proteome (or a subset of it).

ADD REPLY
0
Entering edit mode

No need to apologize. I asked for the 'genome'. I want indeed the proteome.

ADD REPLY
0
Entering edit mode

fna is suppose to stand for nucleic acid and faa for amino acid, thus fna files and faa cannot be the same.

ADD REPLY
0
Entering edit mode

That's right + change the extension won't change the sequence.

ADD REPLY
1
Entering edit mode
9.5 years ago
CrLs ▴ 10

Well, I discovered the 'fasta_cds_aa' rettype (i was'nt aware of it) so I write this (python script). It seems to work.

Feel free to use.

from Bio import Entrez
Entrez.email = "email@email.com"
ident = 'XX00001','XX000002'
for I in ident :
    handle = Entrez.efetch(db="nuccore", id=i, rettype="fasta_cds_aa")
    Fasta = open(i + '.fasta', 'a')
    text = handle.read()
    Fasta.write(text)
    Fasta.close()

Regards,
Charles

ADD COMMENT

Login before adding your answer.

Traffic: 3112 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6