Question: Choosing From Several .Faa Files In The Ncbi Ftp Server
gravatar for Alf
9.1 years ago by
Alf450 wrote:

Hi everybody.

I am trying to download some bacterial genomes (concretely the amino-acid FASTA files) from the NCBI FTP server ( ). It seems that, for many of the organisms, several ".faa" files are given.

Which one should I choose? Should I use them all, concatenated? Any idea of this?

Thank you :)

ncbi fasta bacteria • 3.4k views
ADD COMMENTlink modified 9.1 years ago by Simon Cockell7.3k • written 9.1 years ago by Alf450
gravatar for Simon Cockell
9.1 years ago by
Simon Cockell7.3k
Simon Cockell7.3k wrote:

There are multiple files for strains because they represent the main chromosome and any plasmids found in the strain. For instance *E. coli* 0157 has a file representing the main chromosome (NC_011353), and 2 plasmids (p0157 and pEC4115).

The plasmids are critical to defining the strain and can carry genes that provide vital functions to the bacteria (for instance, virulence genes are often carried on plasmids), so whether to include them in your analysis depends on your use case, but I would usually recommend it. Unless you need to be able to tell the proteins encoded by the plasmids apart from those from the main chromosome, concatenating them into a single FASTA file probably makes the most sense.

ADD COMMENTlink modified 9.1 years ago • written 9.1 years ago by Simon Cockell7.3k
gravatar for Lars Juhl Jensen
9.1 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

When there are multiple .faa files, it is typically because one contains the chromosome whereas the others are plasmids. In this case it would make sense to either use all or only the largest file (that should be the chromosome). However, be careful if you do the latter since there is a small number of bacteria that have more than one chromosome.

ADD COMMENTlink written 9.1 years ago by Lars Juhl Jensen11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour